Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity
- URL: http://arxiv.org/abs/2101.01041v1
- Date: Mon, 4 Jan 2021 16:00:46 GMT
- Title: Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity
- Authors: Kaiqing Zhang, Xiangyuan Zhang, Bin Hu, Tamer Ba\c{s}ar
- Abstract summary: Direct policy search serves as one of the workhorses in modern reinforcement learning (RL)
We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller.
One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
- Score: 15.940861063732608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Direct policy search serves as one of the workhorses in modern reinforcement
learning (RL), and its applications in continuous control tasks have recently
attracted increasing attention. In this work, we investigate the convergence
theory of policy gradient (PG) methods for learning the linear risk-sensitive
and robust controller. In particular, we develop PG methods that can be
implemented in a derivative-free fashion by sampling system trajectories, and
establish both global convergence and sample complexity results in the
solutions of two fundamental settings in risk-sensitive and robust control: the
finite-horizon linear exponential quadratic Gaussian, and the finite-horizon
linear-quadratic disturbance attenuation problems. As a by-product, our results
also provide the first sample complexity for the global convergence of PG
methods on solving zero-sum linear-quadratic dynamic games, a
nonconvex-nonconcave minimax optimization problem that serves as a baseline
setting in multi-agent reinforcement learning (MARL) with continuous spaces.
One feature of our algorithms is that during the learning phase, a certain
level of robustness/risk-sensitivity of the controller is preserved, which we
termed as the implicit regularization property, and is an essential requirement
in safety-critical control systems.
Related papers
- Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting.
We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting.
In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Offline RL via Feature-Occupancy Gradient Ascent [9.983014605039658]
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs)
We develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies.
We show that the resulting simple algorithm satisfies strong computational and sample complexity guarantees.
arXiv Detail & Related papers (2024-05-22T15:39:05Z) - Real-Time Adaptive Safety-Critical Control with Gaussian Processes in
High-Order Uncertain Models [14.790031018404942]
This paper presents an adaptive online learning framework for systems with uncertain parameters.
We first integrate a forgetting factor to refine a variational sparse GP algorithm.
In the second phase, we propose a safety filter based on high-order control barrier functions.
arXiv Detail & Related papers (2024-02-29T08:25:32Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Robust Reinforcement Learning: A Case Study in Linear Quadratic
Regulation [23.76925146112261]
This paper studies the robustness of reinforcement learning algorithms to errors in the learning process.
It is shown that policy iteration for LQR is inherently robust to small errors in the learning process.
arXiv Detail & Related papers (2020-08-25T11:11:28Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.