Stable Reinforcement Learning with Unbounded State Space
- URL: http://arxiv.org/abs/2006.04353v1
- Date: Mon, 8 Jun 2020 05:00:25 GMT
- Title: Stable Reinforcement Learning with Unbounded State Space
- Authors: Devavrat Shah, Qiaomin Xie, Zhi Xu
- Abstract summary: We consider the problem of reinforcement learning with unbounded state space motivated by the classical problem of scheduling in a queueing network.
Traditional policies as well as error metric that are designed for finite, bounded or compact state space, require infinite samples for providing meaningful performance guarantee.
We propose stability as the notion of "goodness": the state dynamics under the policy should remain in a bounded region with high probability.
- Score: 27.053432445897016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of reinforcement learning (RL) with unbounded state
space motivated by the classical problem of scheduling in a queueing network.
Traditional policies as well as error metric that are designed for finite,
bounded or compact state space, require infinite samples for providing any
meaningful performance guarantee (e.g. $\ell_\infty$ error) for unbounded state
space. That is, we need a new notion of performance metric. As the main
contribution of this work, inspired by the literature in queuing systems and
control theory, we propose stability as the notion of "goodness": the state
dynamics under the policy should remain in a bounded region with high
probability. As a proof of concept, we propose an RL policy using
Sparse-Sampling-based Monte Carlo Oracle and argue that it satisfies the
stability property as long as the system dynamics under the optimal policy
respects a Lyapunov function. The assumption of existence of a Lyapunov
function is not restrictive as it is equivalent to the positive recurrence or
stability property of any Markov chain, i.e., if there is any policy that can
stabilize the system then it must possess a Lyapunov function. And, our policy
does not utilize the knowledge of the specific Lyapunov function. To make our
method sample efficient, we provide an improved, sample efficient
Sparse-Sampling-based Monte Carlo Oracle with Lipschitz value function that may
be of interest in its own right. Furthermore, we design an adaptive version of
the algorithm, based on carefully constructed statistical tests, which finds
the correct tuning parameter automatically.
Related papers
- A Test-Function Approach to Incremental Stability [33.44344966171865]
The regularity of value functions, and their connection to incremental stability, can be understood in a way that is distinct from the traditional Lyapunov-based approach to certifying stability in control theory.
arXiv Detail & Related papers (2025-07-01T11:46:52Z) - Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions [15.306107403623075]
We study the problem of certifying the stability of closed-loop systems under control policies derived from optimal control or reinforcement learning (RL)<n>Classical Lyapunov methods require a strict step-wise decrease in the Lyapunov function but such a certificate is difficult to construct for a learned control policy.<n>We formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms.
arXiv Detail & Related papers (2025-05-16T07:36:40Z) - Performance of NPG in Countable State-Space Average-Cost RL [12.949520455740092]
We consider policy optimization methods in reinforcement learning settings where the state space is arbitrarily large.
The motivation arises from control problems in communication networks, matching markets, and other queueing systems.
arXiv Detail & Related papers (2024-05-30T20:29:52Z) - Learning Over Contracting and Lipschitz Closed-Loops for
Partially-Observed Nonlinear Systems (Extended Version) [1.2430809884830318]
This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems.
We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions.
We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.
arXiv Detail & Related papers (2023-04-12T23:55:56Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed
Stability in Nonlinear Dynamical Systems [66.9461097311667]
We propose a model-based reinforcement learning framework with formal stability guarantees.
The proposed method learns the system dynamics up to a confidence interval using feature representation.
We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system.
arXiv Detail & Related papers (2022-06-03T17:27:04Z) - Youla-REN: Learning Nonlinear Feedback Policies with Robust Stability
Guarantees [5.71097144710995]
This paper presents a parameterization of nonlinear controllers for uncertain systems building on a recently developed neural network architecture.
The proposed framework has "built-in" guarantees of stability, i.e., all policies in the search space result in a contracting (globally exponentially stable) closed-loop system.
arXiv Detail & Related papers (2021-12-02T13:52:37Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z) - Convergence Guarantees of Policy Optimization Methods for Markovian Jump
Linear Systems [3.3343656101775365]
We show that the Gauss-Newton method converges to the optimal state feedback controller for MJLS at a linear rate if at a controller which stabilizes the closed-loop dynamics in the mean sense.
We present an example to support our theory.
arXiv Detail & Related papers (2020-02-10T21:13:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.