Reinforcement learning for linear-convex models with jumps via stability
analysis of feedback controls
- URL: http://arxiv.org/abs/2104.09311v1
- Date: Mon, 19 Apr 2021 13:50:52 GMT
- Title: Reinforcement learning for linear-convex models with jumps via stability
analysis of feedback controls
- Authors: Xin Guo, Anran Hu, Yufei Zhang
- Abstract summary: We study a finite linear-time continuous-time horizon learning problems an episodic setting.
In this problem, the unknown jump-dif process is controlled to nonsmooth convex costs.
- Score: 7.969435896173812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study finite-time horizon continuous-time linear-convex reinforcement
learning problems in an episodic setting. In this problem, the unknown linear
jump-diffusion process is controlled subject to nonsmooth convex costs. We show
that the associated linear-convex control problems admit Lipchitz continuous
optimal feedback controls and further prove the Lipschitz stability of the
feedback controls, i.e., the performance gap between applying feedback controls
for an incorrect model and for the true model depends Lipschitz-continuously on
the magnitude of perturbations in the model coefficients; the proof relies on a
stability analysis of the associated forward-backward stochastic differential
equation. We then propose a novel least-squares algorithm which achieves a
regret of the order $O(\sqrt{N\ln N})$ on linear-convex learning problems with
jumps, where $N$ is the number of learning episodes; the analysis leverages the
Lipschitz stability of feedback controls and concentration properties of
sub-Weibull random variables.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Stabilizing Extreme Q-learning by Maclaurin Expansion [51.041889588036895]
Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution.
It has demonstrated strong performance in both offline and online reinforcement learning settings.
We propose Maclaurin Expanded Extreme Q-learning to enhance stability.
arXiv Detail & Related papers (2024-06-07T12:43:17Z) - On the stability of Lipschitz continuous control problems and its application to reinforcement learning [1.534667887016089]
We address the crucial yet underexplored stability properties of the Hamilton--Jacobi--Bellman (HJB) equation in model-free reinforcement learning contexts.
We bridge the gap between Lipschitz continuous optimal control problems and classical optimal control problems in the viscosity solutions framework.
arXiv Detail & Related papers (2024-04-20T08:21:25Z) - Learning Over Contracting and Lipschitz Closed-Loops for
Partially-Observed Nonlinear Systems (Extended Version) [1.2430809884830318]
This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems.
We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions.
We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.
arXiv Detail & Related papers (2023-04-12T23:55:56Z) - Regret Bounds for Adaptive Nonlinear Control [14.489004143703825]
We prove the first finite-time regret bounds for adaptive nonlinear control with subject uncertainty in the setting.
We show that the regret suffered by certainty equivalence adaptive control, compared to an oracle controller with perfect knowledge of the unmodeled disturbances, is upper bounded by $widetildeO(sqrtT)$ in expectation.
arXiv Detail & Related papers (2020-11-26T03:01:09Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - Regret Minimization in Partially Observable Linear Quadratic Control [91.43582419264763]
We study the problem of regret in partially observable linear quadratic control systems when the model dynamics are unknown a priori.
We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control.
arXiv Detail & Related papers (2020-01-31T22:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.