Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright and Credits
Reinforcement Learning with TensorFlow
Packt Upsell
Why subscribe? PacktPub.com
Contributors
About the author About the reviewer Packt is searching for authors like you
Preface
Who this book is for What this book covers To get the most out of this book
Download the example code files Download the color images Conventions used
Get in touch
Reviews
Deep Learning – Architectures and Frameworks
Deep learning
Activation functions for deep learning
The sigmoid function The tanh function The softmax function The rectified linear unit function How to choose the right activation function
Logistic regression as a neural network
Notation Objective The cost function The gradient descent algorithm The computational graph Steps to solve logistic regression using gradient descent
What is xavier initialization? Why do we use xavier initialization?
The neural network model
Recurrent neural networks Long Short Term Memory Networks Convolutional neural networks
The LeNet-5 convolutional neural network The AlexNet model The VGG-Net model The Inception model
Limitations of deep learning
The vanishing gradient problem The exploding gradient problem Overcoming the limitations of deep learning
Reinforcement learning
Basic terminologies and conventions Optimality criteria
The value function for optimality The policy model for optimality
The Q-learning approach to reinforcement learning Asynchronous advantage actor-critic
Introduction to TensorFlow and OpenAI Gym
Basic computations in TensorFlow An introduction to OpenAI Gym
The pioneers and breakthroughs in reinforcement learning
David Silver Pieter Abbeel Google DeepMind The AlphaGo program Libratus
Summary
Training Reinforcement Learning Agents Using OpenAI Gym
The OpenAI Gym
Understanding an OpenAI Gym environment
Programming an agent using an OpenAI Gym environment
Q-Learning
The Epsilon-Greedy approach
Using the Q-Network for real-world applications
Summary
Markov Decision Process
Markov decision processes
The Markov property The S state set Actions Transition model Rewards Policy The sequence of rewards - assumptions
The infinite horizons Utility of sequences
The Bellman equations
Solving the Bellman equation to find policies
An example of value iteration using the Bellman equation Policy iteration
Partially observable Markov decision processes
State estimation Value iteration in POMDPs
Training the FrozenLake-v0 environment using MDP Summary
Policy Gradients
The policy optimization method Why policy optimization methods?
Why stochastic policy?
Example 1 - rock, paper, scissors Example 2 - state aliased grid-world
Policy objective functions
Policy Gradient Theorem
Temporal difference rule
TD(1) rule TD(0) rule TD() rule
Policy gradients
The Monte Carlo policy gradient Actor-critic algorithms Using a baseline to reduce variance Vanilla policy gradient
Agent learning pong using policy gradients Summary
Q-Learning and Deep Q-Networks
Why reinforcement learning? Model based learning and model free learning
Monte Carlo learning Temporal difference learning On-policy and off-policy learning
Q-learning
The exploration exploitation dilemma Q-learning for the mountain car problem in OpenAI gym
Deep Q-networks
Using a convolution neural network instead of a single layer neural network Use of experience replay Separate target network to compute the target Q-values Advancements in deep Q-networks and beyond
Double DQN Dueling DQN
Deep Q-network for mountain car problem in OpenAI gym Deep Q-network for Cartpole problem in OpenAI gym Deep Q-network for Atari Breakout in OpenAI gym
The Monte Carlo tree search algorithm
Minimax and game trees The Monte Carlo Tree Search
The SARSA algorithm
SARSA algorithm for mountain car problem in OpenAI gym
Summary
Asynchronous Methods
Why asynchronous methods? Asynchronous one-step Q-learning Asynchronous one-step SARSA Asynchronous n-step Q-learning Asynchronous advantage actor critic A3C for Pong-v0 in OpenAI gym Summary
Robo Everything – Real Strategy Gaming
Real-time strategy games Reinforcement learning and other approaches
Online case-based planning
Drawbacks to real-time strategy games
Why reinforcement learning?
Reinforcement learning in RTS gaming
Deep autoencoder How is reinforcement learning better?
Summary
AlphaGo – Reinforcement Learning at Its Best
What is Go?
Go versus chess
How did DeepBlue defeat Gary Kasparov?
Why is the game tree approach no good for Go?
AlphaGo – mastering Go
Monte Carlo Tree Search Architecture and properties of AlphaGo  Energy consumption analysis – Lee Sedol versus AlphaGo
AlphaGo Zero
Architecture and properties of AlphaGo Zero
Training process in AlphaGo Zero 
Summary
Reinforcement Learning in Autonomous Driving
Machine learning for autonomous driving Reinforcement learning for autonomous driving
Creating autonomous driving agents Why reinforcement learning ?
Proposed frameworks for autonomous driving
Spatial aggregation
Sensor fusion Spatial features
Recurrent temporal aggregation Planning
DeepTraffic – MIT simulator for autonomous driving  Summary
Financial Portfolio Management
Introduction Problem definition Data preparation Reinforcement learning Further improvements Summary
Reinforcement Learning in Robotics
Reinforcement learning in robotics
Evolution of reinforcement learning
Challenges in robot reinforcement learning
High dimensionality problem Real-world challenges Issues due to model uncertainty What's the final objective a robot wants to achieve?
Open questions and practical challenges
Open questions Practical challenges for robotic reinforcement learning
Key takeaways Summary
Deep Reinforcement Learning in Ad Tech
Computational advertising challenges and bidding strategies
Business models used in advertising Sponsored-search advertisements
Search-advertisement management Adwords
Bidding strategies of advertisers
Real-time bidding by reinforcement learning in display advertising Summary
Reinforcement Learning in Image Processing
Hierarchical object detection with deep reinforcement learning
Related works
Region-based convolution neural networks Spatial pyramid pooling networks Fast R-CNN Faster R-CNN You Look Only Once Single Shot Detector
Hierarchical object detection model
State Actions Reward Model and training
Training specifics
Summary
Deep Reinforcement Learning in NLP
Text summarization
Deep reinforced model for Abstractive Summarization
Neural intra-attention model
Intra-temporal attention on input sequence while decoding Intra-decoder attention Token generation and pointer
Hybrid learning objective
Supervised learning with teacher forcing Policy learning Mixed training objective function
Text question answering
Mixed objective and deep residual coattention for Question Answering
Deep residual coattention encoder Mixed objective using self-critical policy learning
Summary
Further topics in Reinforcement Learning
Continuous action space algorithms
Trust region policy optimization Deterministic policy gradients
Scoring mechanism in sequential models in NLP
BLEU
What is BLEU score and what does it do?
ROUGE
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion