Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
Reinforcement Learning
Overview of machine learning
What is machine learning?
Speech conversion from one language to another Suspicious activity detection from CCTVs Medical diagnostics for detecting diseases
Supervised learning Unsupervised learning Reinforcement learning
Introduction to reinforcement learning
Positive reinforcement learning Negative reinforcement learning Applications of reinforcement learning Self-driving cars Drone autonomous aerial taxi Aerobatics autonomous helicopter TD-Gammon – computer game AlphaGo
The agent environment setup
Exploration versus exploitation Neural network and reinforcement learning Reinforcement learning frameworks/toolkits
OpenAI Gym
Getting Started with OpenAI Gym Docker
Docker installation on Windows environment Docker installation on a Linux environment
Running an environment
Brown-UMBC Reinforcement Learning and Planning
Walkthrough with Hello GridWorld Hello GridWorld project
Summary
Markov Decision Process
Introduction to MDP
State Action Model Reward Policy MDP - more about rewards Optimal policy More about policy
Bellman equation A practical example of building an MDP domain
GridWorld Terminal states Java interfaces for MDP definitions
Single-agent domain State Action Action type SampleModel Environment EnvironmentOutcome TransitionProb
Defining a GridWorld state Defining a GridWorld model Creating the state visualizer Testing it out
Markov chain Building an object-oriented MDP domain Summary
Dynamic Programming
Learning and planning Evaluating a policy Value iteration
Value iteration implementation using BURLAP
Output of the value iteration
Policy iteration Bellman equations
The relationship between Bellman equations
Summary
Temporal Difference Learning
Introducing TD learning TD lambda Estimating from data Learning rate
Properties of learning rate
Overview of TD(1)
An example of TD(1) Why TD(1) is wrong
Overview of TD(0) TD lambda rule K-step estimator Relationship between k-step estimators and TD lambda Summary
Monte Carlo Methods
Monte Carlo methods
First visit Monte Carlo Example – Blackjack
Objective of the game Card scoring/values The deal Naturals The gameplay Applying the Monte Carlo approach Blackjack game implementation
Monte Carlo for control
Monte Carlo Exploring Starts Example - Blackjack
Summary
Learning and Planning
Q-learning
Q-learning example by hand Value iteration
Testing the value iteration code
Q-learning code
Testing Q-learning code Output of the Q-learning program
Summary
Deep Reinforcement Learning
What is a neural network?
A single neuron Feed-forward neural network Multi-Layer Perceptron
Deep learning Deep Q Network
Experience replay The DQN algorithm DQN example – PyTorch and Gym
Task
Packages Replay memory Q-network Input extraction Training Training loop
Example – Flappy Bird using Keras
Dependencies qlearn.py Game screen input Image preprocessing Convolution Neural Network DQN implementation Complete code
Output
Summary
Game Theory
Introduction to game theory
Example of game theory Minimax Fundamental results Game tree von Neumann theorem Mini Poker game Mixed strategies
OpenAI Gym examples
Agents Environments
Example 1 – simple random agent Example 2 – learning agent Example 3 - keyboard learning agent
Summary
Reinforcement Learning Showdown
Reinforcement learning frameworks
PyBrain
Setup Ready to code Environment Agent Task Experiment
RLPy
Setup Ready to code
Maja Machine Learning Framework
Setup
RL-Glue
Setup
RL-Glue components Sample project sample_sarsa_agent.py sample_mines_environment.py sample_experiment.py
Mindpark
Setup
Summary
Applications and Case Studies – Reinforcement Learning
Inverse Reinforcement Learning
IRL algorithm Implementing a car obstacle avoidance problem
Results and observations
Partially Observable Markov Decision Process
POMDP example State estimator Value iteration in POMDP Reinforcement learning for POMDP
Summary
Current Research – Reinforcement Learning
Hierarchical reinforcement learning
Advantages of hierarchical reinforcement learning The SMDP model Hierarchical RL model
Reinforcement learning with hierarchies of abstract machines
HAM framework Running a HAM algorithm HAM for mobile robot example HAM for a RoboCup keepaway example
MAXQ value function decomposition
Taxi world example Decomposition of the projected value function
Summary
Article
State Action Model Reward Policy MDP - more about rewards Optimal policy More about Policy Summary
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion