Related papers: Concave Utility Reinforcement Learning with Zero-Constraint Violations

Concave Utility Reinforcement Learning with Zero-Constraint Violations

URL: http://arxiv.org/abs/2109.05439v3
Date: Fri, 17 Nov 2023 02:20:40 GMT
Title: Concave Utility Reinforcement Learning with Zero-Constraint Violations
Authors: Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
Abstract summary: We consider the problem of concave utility reinforcement learning (CURL) with convex constraints. We propose a model-based learning algorithm that also achieves zero constraint violations.
Score: 43.29210413964558
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. For this, we propose a model-based learning algorithm that also achieves zero constraint violations. Assuming that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures, we solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We use Bellman error-based analysis for tabular infinite-horizon setups which allows analyzing stochastic policies. Combining the Bellman error-based analysis and tighter optimization equation, for $T$ interactions with the environment, we obtain a high-probability regret guarantee for objective which grows as $\Tilde{O}(1/\sqrt{T})$, excluding other factors. The proposed method can be applied for optimistic algorithms to obtain high-probability regret bounds and also be used for posterior sampling algorithms to obtain a loose Bayesian regret bounds but with significant improvement in computational complexity.

Related papers

Learning based convex approximation for constrained parametric optimization [11.379408842026981]
We propose an input neural network (ICNN)-based self-supervised learning framework to solve constrained optimization problems.<n>We provide rigorous convergence analysis, showing that the framework converges to a Karush-Kuhn-Tucker (KKT) approximation point of the original problem.<n>Our approach achieves a superior balance among accuracy, feasibility, and computational efficiency.
arXiv Detail & Related papers (2025-05-07T00:33:14Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs. We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds [28.4976864705409]
This paper studies constrained Markov decision processes (CMDPs) with constraints against estimator thresholds, aiming at safety of reinforcement learning in unknown and uncertain environments. We leverage a GrowingWindow sampling from interactions with the uncertain and dynamic environment to estimate the thresholds, based on which we design Pessimistic-Optimistic Thresholding (SPOT), a novel model-based primal-dual algorithm for multiple constraints against thresholds. SPOT is the first reinforcement learning algorithm that realises guaranteed performance in an uncertain environment where even thresholds are unknown.
arXiv Detail & Related papers (2025-04-07T11:58:19Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z)
Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine Constraints [21.38692458445459]
We study the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints. A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case.
arXiv Detail & Related papers (2023-10-02T08:07:36Z)
Projection-Free Online Convex Optimization with Stochastic Constraints [0.0]
We develop projection-free algorithms for online convex optimization with constraints. We deduce sublinear regret and constraint violation bounds for various settings. We prove that the constraint violation can be reduced to have the same growth as the regret.
arXiv Detail & Related papers (2023-05-02T11:27:34Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Fast Global Convergence of Policy Optimization for Constrained MDPs [17.825031573375725]
We show that gradient-based methods can achieve an $mathcalO(log(T)/T)$ global convergence rate both for the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$.
arXiv Detail & Related papers (2021-10-31T17:46:26Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives. We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem. We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.