Concave Utility Reinforcement Learning with Zero-Constraint Violations
- URL: http://arxiv.org/abs/2109.05439v3
- Date: Fri, 17 Nov 2023 02:20:40 GMT
- Title: Concave Utility Reinforcement Learning with Zero-Constraint Violations
- Authors: Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
- Abstract summary: We consider the problem of concave utility reinforcement learning (CURL) with convex constraints.
We propose a model-based learning algorithm that also achieves zero constraint violations.
- Score: 43.29210413964558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of tabular infinite horizon concave utility
reinforcement learning (CURL) with convex constraints. For this, we propose a
model-based learning algorithm that also achieves zero constraint violations.
Assuming that the concave objective and the convex constraints have a solution
interior to the set of feasible occupation measures, we solve a tighter
optimization problem to ensure that the constraints are never violated despite
the imprecise model knowledge and model stochasticity. We use Bellman
error-based analysis for tabular infinite-horizon setups which allows analyzing
stochastic policies. Combining the Bellman error-based analysis and tighter
optimization equation, for $T$ interactions with the environment, we obtain a
high-probability regret guarantee for objective which grows as
$\Tilde{O}(1/\sqrt{T})$, excluding other factors. The proposed method can be
applied for optimistic algorithms to obtain high-probability regret bounds and
also be used for posterior sampling algorithms to obtain a loose Bayesian
regret bounds but with significant improvement in computational complexity.
Related papers
- Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - Online Constraint Tightening in Stochastic Model Predictive Control: A
Regression Approach [49.056933332667114]
No analytical solutions exist for chance-constrained optimal control problems.
We propose a data-driven approach for learning the constraint-tightening parameters online during control.
Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints.
arXiv Detail & Related papers (2023-10-04T16:22:02Z) - Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine
Constraints [21.38692458445459]
We study the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints.
A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case.
arXiv Detail & Related papers (2023-10-02T08:07:36Z) - Projection-Free Online Convex Optimization with Stochastic Constraints [0.0]
We develop projection-free algorithms for online convex optimization with constraints.
We deduce sublinear regret and constraint violation bounds for various settings.
We prove that the constraint violation can be reduced to have the same growth as the regret.
arXiv Detail & Related papers (2023-05-02T11:27:34Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Fast Global Convergence of Policy Optimization for Constrained MDPs [17.825031573375725]
We show that gradient-based methods can achieve an $mathcalO(log(T)/T)$ global convergence rate both for the optimality gap and the constraint violation.
When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$.
arXiv Detail & Related papers (2021-10-31T17:46:26Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained
Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives.
We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem.
We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.