Related papers: The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

URL: http://arxiv.org/abs/2305.00889v1
Date: Mon, 1 May 2023 15:48:34 GMT
Title: The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback
Authors: Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh
Abstract summary: We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm.
Score: 5.758073912084366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.

Related papers

Policy Testing in Markov Decision Processes [48.642181362172906]
We study the policy testing problem in discounted decision processes (MDP) under the fixed-confidence setting.<n>The goal is to determine whether the value of a given policy exceeds a numerical threshold.
arXiv Detail & Related papers (2025-05-21T10:13:54Z)
Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions [8.758206783988404]
We propose a reinforcement learning framework under a broad class of risk objectives, characterized by convex scoring functions.<n>This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility.<n>We validate our approach in simulation experiments with a financial application in statistical arbitrage trading, demonstrating the effectiveness of the algorithm.
arXiv Detail & Related papers (2025-05-07T16:31:42Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs. We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions. We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z)
Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction. We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Directional Optimism for Safe Linear Bandits [4.84955052297084]
The safe linear bandit problem is a version of the classical linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds. We find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets. Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.
arXiv Detail & Related papers (2023-08-29T03:54:53Z)
Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints [21.38692458445459]
This paper studies the problem of online performance optimization of constrained closed-loop control systems. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution.
arXiv Detail & Related papers (2023-04-12T18:37:52Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Concave Utility Reinforcement Learning with Zero-Constraint Violations [43.29210413964558]
We consider the problem of concave utility reinforcement learning (CURL) with convex constraints. We propose a model-based learning algorithm that also achieves zero constraint violations.
arXiv Detail & Related papers (2021-09-12T06:13:33Z)
Screening Rules and its Complexity for Active Set Identification [16.762870396299334]
We show that screening rules stem from a combination of natural properties of subdifferential sets and optimality conditions. Under mild assumptions, we analyze the number of iterations needed to identify the optimal active set for any converging algorithm.
arXiv Detail & Related papers (2020-09-06T11:10:34Z)
Constrained episodic reinforcement learning in concave-convex and knapsack settings [81.08055425644037]
We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
arXiv Detail & Related papers (2020-06-09T05:02:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.