Deep Learning – Architectures and Frameworks
Deep learning
Activation functions for deep learning
The sigmoid function
The tanh function
The softmax function
The rectified linear unit function
How to choose the right activation function
Logistic regression as a neural network
The cost function
The gradient descent algorithm
The computational graph
Steps to solve logistic regression using gradient descent
What is xavier initialization?
Why do we use xavier initialization?
The neural network model
Recurrent neural networks
Long Short Term Memory Networks
Convolutional neural networks
The LeNet-5 convolutional neural network
The AlexNet model
The VGG-Net model
The Inception model
Limitations of deep learning
The vanishing gradient problem
The exploding gradient problem
Overcoming the limitations of deep learning
Reinforcement learning
Basic terminologies and conventions
Optimality criteria
The value function for optimality
The policy model for optimality
The Q-learning approach to reinforcement learning
Asynchronous advantage actor-critic
Introduction to TensorFlow and OpenAI Gym
Basic computations in TensorFlow
An introduction to OpenAI Gym
The pioneers and breakthroughs in reinforcement learning
David Silver
Pieter Abbeel
Google DeepMind
The AlphaGo program
Training Reinforcement Learning Agents Using OpenAI Gym
The OpenAI Gym
Understanding an OpenAI Gym environment
Programming an agent using an OpenAI Gym environment
The Epsilon-Greedy approach
Using the Q-Network for real-world applications
Markov Decision Process
Markov decision processes
The Markov property
The S state set
Transition model
The sequence of rewards - assumptions
The infinite horizons
Utility of sequences
The Bellman equations
Solving the Bellman equation to find policies
An example of value iteration using the Bellman equation
Policy iteration
Partially observable Markov decision processes
State estimation
Value iteration in POMDPs
Training the FrozenLake-v0 environment using MDP
Policy Gradients
The policy optimization method
Why policy optimization methods?
Why stochastic policy?
Example 1 - rock, paper, scissors
Example 2 - state aliased grid-world
Policy objective functions
Policy Gradient Theorem
Temporal difference rule
TD(1) rule
TD(0) rule
TD() rule
Policy gradients
The Monte Carlo policy gradient
Actor-critic algorithms
Using a baseline to reduce variance
Vanilla policy gradient
Agent learning pong using policy gradients
Q-Learning and Deep Q-Networks
Why reinforcement learning?
Model based learning and model free learning
Monte Carlo learning
Temporal difference learning
On-policy and off-policy learning
The exploration exploitation dilemma
Q-learning for the mountain car problem in OpenAI gym
Deep Q-networks
Using a convolution neural network instead of a single layer neural network
Use of experience replay
Separate target network to compute the target Q-values
Advancements in deep Q-networks and beyond
Double DQN
Dueling DQN
Deep Q-network for mountain car problem in OpenAI gym
Deep Q-network for Cartpole problem in OpenAI gym
Deep Q-network for Atari Breakout in OpenAI gym
The Monte Carlo tree search algorithm
Minimax and game trees
The Monte Carlo Tree Search
The SARSA algorithm
SARSA algorithm for mountain car problem in OpenAI gym
Asynchronous Methods
Why asynchronous methods?
Asynchronous one-step Q-learning
Asynchronous one-step SARSA
Asynchronous n-step Q-learning
Asynchronous advantage actor critic
A3C for Pong-v0 in OpenAI gym
Robo Everything – Real Strategy Gaming
Real-time strategy games
Reinforcement learning and other approaches
Online case-based planning
Drawbacks to real-time strategy games
Why reinforcement learning?
Reinforcement learning in RTS gaming
Deep autoencoder
How is reinforcement learning better?
AlphaGo – Reinforcement Learning at Its Best
What is Go?
Go versus chess
How did DeepBlue defeat Gary Kasparov?
Why is the game tree approach no good for Go?
AlphaGo – mastering Go
Monte Carlo Tree Search
Architecture and properties of AlphaGo 
Energy consumption analysis – Lee Sedol versus AlphaGo
AlphaGo Zero
Architecture and properties of AlphaGo Zero
Training process in AlphaGo Zero 
Reinforcement Learning in Autonomous Driving
Machine learning for autonomous driving
Reinforcement learning for autonomous driving
Creating autonomous driving agents
Why reinforcement learning ?
Proposed frameworks for autonomous driving
Spatial aggregation
Sensor fusion
Spatial features
Recurrent temporal aggregation
DeepTraffic – MIT simulator for autonomous driving 
Financial Portfolio Management
Problem definition
Data preparation
Reinforcement learning
Further improvements
Reinforcement Learning in Robotics
Reinforcement learning in robotics
Evolution of reinforcement learning
Challenges in robot reinforcement learning
High dimensionality problem
Real-world challenges
Issues due to model uncertainty
What's the final objective a robot wants to achieve?
Open questions and practical challenges
Open questions
Practical challenges for robotic reinforcement learning
Key takeaways
Deep Reinforcement Learning in Ad Tech
Computational advertising challenges and bidding strategies
Business models used in advertising
Sponsored-search advertisements
Search-advertisement management
Bidding strategies of advertisers
Real-time bidding by reinforcement learning in display advertising
Reinforcement Learning in Image Processing
Hierarchical object detection with deep reinforcement learning
Related works
Region-based convolution neural networks
Spatial pyramid pooling networks
Fast R-CNN
Faster R-CNN
You Look Only Once
Single Shot Detector
Hierarchical object detection model
Model and training
Training specifics
Deep Reinforcement Learning in NLP
Text summarization
Deep reinforced model for Abstractive Summarization
Neural intra-attention model
Intra-temporal attention on input sequence while decoding
Intra-decoder attention
Token generation and pointer
Hybrid learning objective
Supervised learning with teacher forcing
Policy learning
Mixed training objective function
Text question answering
Mixed objective and deep residual coattention for Question Answering
Deep residual coattention encoder
Mixed objective using self-critical policy learning
Further topics in Reinforcement Learning
Continuous action space algorithms
Trust region policy optimization
Deterministic policy gradients
Scoring mechanism in sequential models in NLP
What is BLEU score and what does it do?
