Ternary Policy Iteration Algorithm for Nonlinear Robust Control
- URL: http://arxiv.org/abs/2007.06810v1
- Date: Tue, 14 Jul 2020 04:31:28 GMT
- Title: Ternary Policy Iteration Algorithm for Nonlinear Robust Control
- Authors: Jie Li, Shengbo Eben Li, Yang Guan, Jingliang Duan, Wenyu Li, Yuming
Yin
- Abstract summary: This paper develops a ternary policy gradient (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties.
The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies.
- Score: 12.392480840842728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The uncertainties in plant dynamics remain a challenge for nonlinear control
problems. This paper develops a ternary policy iteration (TPI) algorithm for
solving nonlinear robust control problems with bounded uncertainties. The
controller and uncertainty of the system are considered as game players, and
the robust control problem is formulated as a two-player zero-sum differential
game. In order to solve the differential game, the corresponding
Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and
three update phases are designed to match the identity equation, minimization
and maximization of the HJI equation, respectively. These loss functions are
defined by the expectation of the approximate Hamiltonian in a generated state
set to prevent operating all the states in the entire state set concurrently.
The parameters of value function and policies are directly updated by
diminishing the designed loss functions using the gradient descent method.
Moreover, zero-initialization can be applied to the parameters of the control
policy. The effectiveness of the proposed TPI algorithm is demonstrated through
two simulation studies. The simulation results show that the TPI algorithm can
converge to the optimal solution for the linear plant, and has high resistance
to disturbances for the nonlinear plant.
Related papers
- Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Value Approximation for Two-Player General-Sum Differential Games with State Constraints [24.012924492073974]
Solving Hamilton-Jacobi-Isaacs (HJI) PDEs numerically enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD)
While physics-informed neural networks (PINNs) have shown promise in alleviating CoD in solving PDEs, vanilla PINNs fall short in learning discontinuous solutions due to their sampling nature.
In this study, we explore three potential solutions to this challenge: (1) a hybrid learning method that is guided by both supervisory equilibria and the HJI PDE, (2) a value-hardening method
arXiv Detail & Related papers (2023-11-28T04:58:41Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Neural ODEs as Feedback Policies for Nonlinear Optimal Control [1.8514606155611764]
We use Neural ordinary differential equations (Neural ODEs) to model continuous time dynamics as differential equations parametrized with neural networks.
We propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems.
arXiv Detail & Related papers (2022-10-20T13:19:26Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Concurrent Learning Based Tracking Control of Nonlinear Systems using
Gaussian Process [2.7930955543692817]
This paper demonstrates the applicability of the combination of concurrent learning as a tool for parameter estimation and non-parametric Gaussian Process for online disturbance learning.
A control law is developed by using both techniques sequentially in the context of feedback linearization.
The closed-loop system stability for the nth-order system is proven using the Lyapunov stability theorem.
arXiv Detail & Related papers (2021-06-02T02:59:48Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Convergence and sample complexity of gradient methods for the model-free
linear quadratic regulator problem [27.09339991866556]
We show that ODE searches for optimal control for an unknown computation system by directly searching over the corresponding space of controllers.
We take a step towards demystifying the performance and efficiency of such methods by focusing on the gradient-flow dynamics set of stabilizing feedback gains and a similar result holds for the forward disctization of the ODE.
arXiv Detail & Related papers (2019-12-26T16:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.