Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Cover Series Page Title Page Copyright Preface Contributors Part I: Feedback Control Using RL And ADP
Chapter 1: Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead
1.1 Introduction 1.2 What is RLADP? 1.3 Some Basic Challenges in Implementing ADP Disclaimer References
Chapter 2: Stable Adaptive Neural Control of Partially Observable Dynamic Systems
2.1 Introduction 2.2 Background 2.3 Stability Bias 2.4 Example Application References
Chapter 3: Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm
3.1 Background Material 3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm 3.3 Generalization 3.4 Simulation Studies 3.5 Summary References
Chapter 4: Learning and Optimization in Hierarchical Adaptive Critic Design
4.1 Introduction 4.2 Hierarchical ADP Architecture with Multiple-Goal Representation 4.3 Case Study: The Ball-and-Beam System 4.4 Conclusions and Future Work Acknowledgments References
Chapter 5: Single Network Adaptive Critics Networks—Development, Analysis, and Applications
5.1 Introduction 5.2 Approximate Dynamic Programing 5.3 SNAC 5.4 J-SNAC 5.5 Finite-SNAC 5.6 Conclusions Acknowledgments References
Chapter 6: Linearly Solvable Optimal Control
6.1 Introduction 6.2 Linearly Solvable Optimal Control Problems 6.3 Extension to Risk-Sensitive Control and Game Theory 6.4 Properties and Algorithms 6.5 Conclusions and Future Work References
Chapter 7: Approximating Optimal Control with Value Gradient Learning
7.1 Introduction 7.2 Value Gradient Learning and BPTT Algorithms 7.3 A Convergence Proof for VGL(1) for Control with Function Approximation 7.4 Vertical Lander Experiment 7.5 Conclusions References
Chapter 8: A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming
8.1 Background 8.2 Constrained Backpropagation (CPROP) Approach 8.3 Solution of Partial Differential Equations in Nonstationary Environments 8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs 8.5 Summary Algebraic ANN Control Matrices References
Chapter 9: Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance
9.1 Introduction 9.2 Direct Heuristic Dynamic Programming 9.3 A Control Theoretic View on the Direct HDP 9.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information 9.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation 9.6 Summary Acknowledgment References
Chapter 10: Reinforcement Learning Control with Time-Dependent Agent Dynamics
10.1 Introduction 10.2 Q-Learning 10.3 Sampled Data Q-Learning 10.4 System Dynamics Approximation 10.5 Closing Remarks References
Chapter 11: Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations
11.1 Introduction 11.2 Background 11.3 Reinforcement Learning Based Control 11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control 11.5 Simulation Result References
Chapter 12: An Actor–Critic–Identifier Architecture for Adaptive Approximate Optimal Control
12.1 Introduction 12.2 Actor–Critic–Identifier Architecture for HJB Approximation 12.3 Actor–Critic Design 12.4 Identifier Design 12.5 Convergence and Stability Analysis 12.6 Simulation 12.7 Conclusion References
Chapter 13: Robust Adaptive Dynamic Programming
13.1 Introduction 13.2 Optimality Versus Robustness 13.3 Robust-ADP Design for Disturbance Attenuation 13.4 Robust-ADP for Partial-State Feedback Control 13.5 Applications 13.6 Summary Acknowledgment References
Part II: Learning and Control in Multiagent Games
Chapter 14: Hybrid Learning in Stochastic Games and Its Application in Network Security
14.1 Introduction 14.2 Two-Person Game 14.3 Learning in NZSGs 14.4 Main Results 14.5 Security Application 14.6 Conclusions and future works Appendix: Assumptions for Stochastic Approximation References
Chapter 15: Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games
15.1 Introduction 15.2 Two-Player Games and Integral Reinforcement Learning 15.3 Continuous-Time Value Iteration to Solve the Riccati Equation 15.4 Online Algorithm to Solve Nonzero-Sum Games 15.5 Analysis of the Online Learning Algorithm for NZS Games 15.6 Simulation Result for the Online Game Algorithm 15.7 Conclusion References
Chapter 16: Online Learning Algorithms for Optimal Control and Dynamic Games
16.1 Introduction 16.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation 16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton–Jacobi–Isaacs Equation 16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations References
Part III: Foundations in MDP And RL
Chapter 17: Lambda-Policy Iteration: A Review and a New Implementation
17.1 Introduction 17.2 Lambda-Policy Iteration without Cost Function Approximation 17.3 Approximate Policy Evaluation Using Projected Equations 17.4 Lambda-Policy Iteration with Cost Function Approximation 17.5 Conclusions Acknowledgments References
Chapter 18: Optimal Learning and Approximate Dynamic Programming
18.1 Introduction 18.2 Modeling 18.3 The Four Classes of Policies 18.4 Basic Learning Policies for Policy Search 18.5 Optimal Learning Policies for Policy Search 18.6 Learning with a Physical State References
Chapter 19: An Introduction to Event-Based Optimization: Theory and Applications
19.1 Introduction 19.2 Literature Review 19.3 Problem Formulation 19.4 Policy Iteration for EBO 19.5 Example: Material Handling Problem 19.6 Conclusions Acknowledgments References
Chapter 20: Bounds for Markov Decision Processes
20.1 Introduction 20.2 Problem Formulation 20.3 The Linear Programming Approach 20.4 The Martingale Duality Approach 20.5 The Pathwise Optimization Method 20.6 Applications 20.7 Conclusion References
Chapter 21: Approximate Dynamic Programming and Backpropagation on Timescales
21.1 Introduction: Timescales Fundamentals 21.2 Dynamic Programming 21.3 Backpropagation 21.4 Conclusions Acknowledgments References
Chapter 22: A Survey of Optimistic Planning in Markov Decision Processes
22.1 Introduction 22.2 Optimistic Online Optimization 22.3 Optimistic Planning Algorithms 22.4 Related Planning Algorithms 22.5 Numerical Example References
Chapter 23: Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
23.1 Introduction 23.2 The Framework 23.3 The Feature Adaptation Scheme 23.4 Convergence Analysis 23.5 Application to Traffic Signal Control 23.6 Conclusions References
Chapter 24: Feature Selection for Neuro-Dynamic Programming
24.1 Introduction 24.2 Optimality Equations 24.3 Neuro-Dynamic Algorithms 24.4 Fluid Models 24.5 Diffusion Models 24.6 Mean Field Games 24.7 Conclusions References
Chapter 25: Approximate Dynamic Programming for Optimizing Oil Production
25.1 Introduction 25.2 Petroleum Reservoir Production Optimization Problem 25.3 Review of Dynamic Programming and Approximate Dynamic Programming 25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization 25.5 Simulation Results 25.6 Concluding Remarks Acknowledgments References
Chapter 26: A Learning Strategy for Source Tracking in Unstructured Environments
26.1 Introduction 26.2 Reinforcement Learning 26.3 Light-Following Robot 26.4 Simulation Results 26.5 Experimental Results 26.6 Conclusions and Future Work Acknowledgments References
Index IEEE Press Series on Computational Intelligence
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion