Concave Utility Reinforcement Learning with Zero-Constraint Violations
- URL: http://arxiv.org/abs/2109.05439v3
- Date: Fri, 17 Nov 2023 02:20:40 GMT
- Title: Concave Utility Reinforcement Learning with Zero-Constraint Violations
- Authors: Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
- Abstract summary: We consider the problem of concave utility reinforcement learning (CURL) with convex constraints.
We propose a model-based learning algorithm that also achieves zero constraint violations.
- Score: 43.29210413964558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of tabular infinite horizon concave utility
reinforcement learning (CURL) with convex constraints. For this, we propose a
model-based learning algorithm that also achieves zero constraint violations.
Assuming that the concave objective and the convex constraints have a solution
interior to the set of feasible occupation measures, we solve a tighter
optimization problem to ensure that the constraints are never violated despite
the imprecise model knowledge and model stochasticity. We use Bellman
error-based analysis for tabular infinite-horizon setups which allows analyzing
stochastic policies. Combining the Bellman error-based analysis and tighter
optimization equation, for $T$ interactions with the environment, we obtain a
high-probability regret guarantee for objective which grows as
$\Tilde{O}(1/\sqrt{T})$, excluding other factors. The proposed method can be
applied for optimistic algorithms to obtain high-probability regret bounds and
also be used for posterior sampling algorithms to obtain a loose Bayesian
regret bounds but with significant improvement in computational complexity.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine
Constraints [21.38692458445459]
We study the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints.
A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case.
arXiv Detail & Related papers (2023-10-02T08:07:36Z) - Projection-Free Online Convex Optimization with Stochastic Constraints [0.0]
We develop projection-free algorithms for online convex optimization with constraints.
We deduce sublinear regret and constraint violation bounds for various settings.
We prove that the constraint violation can be reduced to have the same growth as the regret.
arXiv Detail & Related papers (2023-05-02T11:27:34Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Fast Global Convergence of Policy Optimization for Constrained MDPs [17.825031573375725]
We show that gradient-based methods can achieve an $mathcalO(log(T)/T)$ global convergence rate both for the optimality gap and the constraint violation.
When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$.
arXiv Detail & Related papers (2021-10-31T17:46:26Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained
Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives.
We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem.
We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.