Truly No-Regret Learning in Constrained MDPs
- URL: http://arxiv.org/abs/2402.15776v3
- Date: Fri, 19 Jul 2024 14:00:43 GMT
- Title: Truly No-Regret Learning in Constrained MDPs
- Authors: Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He,
- Abstract summary: We propose a model-based primal-dual algorithm to learn in an unknown CMDP.
We prove that our algorithm achieves sublinear regret without error cancellations.
- Score: 61.78619476991494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in another. This makes the online learning process unsafe since it only guarantees safety for the final (mixture) policy but not during learning. As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations. In this paper, we give the first affirmative answer. We first generalize a result on last-iterate convergence of regularized primal-dual schemes to CMDPs with multiple constraints. Building upon this insight, we propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations.
Related papers
- Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization [37.24692425018]
We study online learning in emphconstrained MDPs (CMDPs)
Our algorithm implements a primal-dual scheme that employs a state-of-the-art policy optimization approach for adversarial MDPs.
arXiv Detail & Related papers (2024-10-03T07:54:04Z) - Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints [34.7178680288326]
In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining sublinear regret and sublinear constraint violation.
We show that this negative result can be eased in CMDPs with non-stationary rewards and constraints, by providing algorithms whose performances smoothly degrade as non-stationarity increases.
arXiv Detail & Related papers (2024-05-23T09:48:48Z) - Learning Adversarial MDPs with Stochastic Hard Constraints [37.24692425018]
We study online learning problems in constrained decision processes with adversarial losses and hard constraints.
We design an algorithm that achieves sublinear regret while ensuring that the constraints are satisfied at every episode with high probability.
arXiv Detail & Related papers (2024-03-06T12:49:08Z) - Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained
Markov Decision Processes [24.51454563844664]
We propose a novel model-based dual algorithm OptAug-CMDP for finite-horizon CMDPs.
Our algorithm achieves a regret without the need for the cancellation of errors.
arXiv Detail & Related papers (2023-06-12T10:10:57Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z) - Learning and Planning in Average-Reward Markov Decision Processes [15.586087060535398]
We introduce learning and planning algorithms for average-reward MDPs.
All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward.
arXiv Detail & Related papers (2020-06-29T19:03:24Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z) - Exploration-Exploitation in Constrained MDPs [79.23623305214275]
We investigate the exploration-exploitation dilemma in Constrained Markov Decision Processes (CMDPs)
While learning in an unknown CMDP, an agent should trade-off exploration to discover new information about the MDP.
While the agent will eventually learn a good or optimal policy, we do not want the agent to violate the constraints too often during the learning process.
arXiv Detail & Related papers (2020-03-04T17:03:56Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.