Reinforcement Learning and Approximate Dynamic Programming for Feedback Control by Lewis, Frank L. -- Read -- Imperial Library of Trantor

Index

Cover Series Page Title Page Copyright Preface Contributors Part I: Feedback Control Using RL And ADP

Chapter 1: Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead

1.1 Introduction 1.2 What is RLADP? 1.3 Some Basic Challenges in Implementing ADP Disclaimer References

Chapter 2: Stable Adaptive Neural Control of Partially Observable Dynamic Systems

2.1 Introduction 2.2 Background 2.3 Stability Bias 2.4 Example Application References

Chapter 3: Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm

3.1 Background Material 3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm 3.3 Generalization 3.4 Simulation Studies 3.5 Summary References

Chapter 4: Learning and Optimization in Hierarchical Adaptive Critic Design

4.1 Introduction 4.2 Hierarchical ADP Architecture with Multiple-Goal Representation 4.3 Case Study: The Ball-and-Beam System 4.4 Conclusions and Future Work Acknowledgments References

Chapter 5: Single Network Adaptive Critics Networks—Development, Analysis, and Applications

5.1 Introduction 5.2 Approximate Dynamic Programing 5.3 SNAC 5.4 J-SNAC 5.5 Finite-SNAC 5.6 Conclusions Acknowledgments References

Chapter 6: Linearly Solvable Optimal Control

6.1 Introduction 6.2 Linearly Solvable Optimal Control Problems 6.3 Extension to Risk-Sensitive Control and Game Theory 6.4 Properties and Algorithms 6.5 Conclusions and Future Work References

Chapter 7: Approximating Optimal Control with Value Gradient Learning

7.1 Introduction 7.2 Value Gradient Learning and BPTT Algorithms 7.3 A Convergence Proof for VGL(1) for Control with Function Approximation 7.4 Vertical Lander Experiment 7.5 Conclusions References

Chapter 8: A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming

8.1 Background 8.2 Constrained Backpropagation (CPROP) Approach 8.3 Solution of Partial Differential Equations in Nonstationary Environments 8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs 8.5 Summary Algebraic ANN Control Matrices References

Chapter 9: Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance

9.1 Introduction 9.2 Direct Heuristic Dynamic Programming 9.3 A Control Theoretic View on the Direct HDP 9.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information 9.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation 9.6 Summary Acknowledgment References

Chapter 10: Reinforcement Learning Control with Time-Dependent Agent Dynamics

10.1 Introduction 10.2 Q-Learning 10.3 Sampled Data Q-Learning 10.4 System Dynamics Approximation 10.5 Closing Remarks References

Chapter 11: Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations

11.1 Introduction 11.2 Background 11.3 Reinforcement Learning Based Control 11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control 11.5 Simulation Result References

Chapter 12: An Actor–Critic–Identifier Architecture for Adaptive Approximate Optimal Control

12.1 Introduction 12.2 Actor–Critic–Identifier Architecture for HJB Approximation 12.3 Actor–Critic Design 12.4 Identifier Design 12.5 Convergence and Stability Analysis 12.6 Simulation 12.7 Conclusion References

Chapter 13: Robust Adaptive Dynamic Programming

13.1 Introduction 13.2 Optimality Versus Robustness 13.3 Robust-ADP Design for Disturbance Attenuation 13.4 Robust-ADP for Partial-State Feedback Control 13.5 Applications 13.6 Summary Acknowledgment References

Part II: Learning and Control in Multiagent Games

Chapter 14: Hybrid Learning in Stochastic Games and Its Application in Network Security

14.1 Introduction 14.2 Two-Person Game 14.3 Learning in NZSGs 14.4 Main Results 14.5 Security Application 14.6 Conclusions and future works Appendix: Assumptions for Stochastic Approximation References

Chapter 15: Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games

15.1 Introduction 15.2 Two-Player Games and Integral Reinforcement Learning 15.3 Continuous-Time Value Iteration to Solve the Riccati Equation 15.4 Online Algorithm to Solve Nonzero-Sum Games 15.5 Analysis of the Online Learning Algorithm for NZS Games 15.6 Simulation Result for the Online Game Algorithm 15.7 Conclusion References

Chapter 16: Online Learning Algorithms for Optimal Control and Dynamic Games

16.1 Introduction 16.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation 16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton–Jacobi–Isaacs Equation 16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations References

Part III: Foundations in MDP And RL

Chapter 17: Lambda-Policy Iteration: A Review and a New Implementation

17.1 Introduction 17.2 Lambda-Policy Iteration without Cost Function Approximation 17.3 Approximate Policy Evaluation Using Projected Equations 17.4 Lambda-Policy Iteration with Cost Function Approximation 17.5 Conclusions Acknowledgments References

Chapter 18: Optimal Learning and Approximate Dynamic Programming

18.1 Introduction 18.2 Modeling 18.3 The Four Classes of Policies 18.4 Basic Learning Policies for Policy Search 18.5 Optimal Learning Policies for Policy Search 18.6 Learning with a Physical State References

Chapter 19: An Introduction to Event-Based Optimization: Theory and Applications

19.1 Introduction 19.2 Literature Review 19.3 Problem Formulation 19.4 Policy Iteration for EBO 19.5 Example: Material Handling Problem 19.6 Conclusions Acknowledgments References

Chapter 20: Bounds for Markov Decision Processes

20.1 Introduction 20.2 Problem Formulation 20.3 The Linear Programming Approach 20.4 The Martingale Duality Approach 20.5 The Pathwise Optimization Method 20.6 Applications 20.7 Conclusion References

Chapter 21: Approximate Dynamic Programming and Backpropagation on Timescales

21.1 Introduction: Timescales Fundamentals 21.2 Dynamic Programming 21.3 Backpropagation 21.4 Conclusions Acknowledgments References

Chapter 22: A Survey of Optimistic Planning in Markov Decision Processes

22.1 Introduction 22.2 Optimistic Online Optimization 22.3 Optimistic Planning Algorithms 22.4 Related Planning Algorithms 22.5 Numerical Example References

Chapter 23: Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

23.1 Introduction 23.2 The Framework 23.3 The Feature Adaptation Scheme 23.4 Convergence Analysis 23.5 Application to Traffic Signal Control 23.6 Conclusions References

Chapter 24: Feature Selection for Neuro-Dynamic Programming

24.1 Introduction 24.2 Optimality Equations 24.3 Neuro-Dynamic Algorithms 24.4 Fluid Models 24.5 Diffusion Models 24.6 Mean Field Games 24.7 Conclusions References

Chapter 25: Approximate Dynamic Programming for Optimizing Oil Production

25.1 Introduction 25.2 Petroleum Reservoir Production Optimization Problem 25.3 Review of Dynamic Programming and Approximate Dynamic Programming 25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization 25.5 Simulation Results 25.6 Concluding Remarks Acknowledgments References

Chapter 26: A Learning Strategy for Source Tracking in Unstructured Environments

26.1 Introduction 26.2 Reinforcement Learning 26.3 Light-Following Robot 26.4 Simulation Results 26.5 Experimental Results 26.6 Conclusions and Future Work Acknowledgments References

Index IEEE Press Series on Computational Intelligence

← Prev
Back
Next →

← Prev
Back
Next →