The Impact of the Geometric Properties of the Constraint Set in Safe
Optimization with Bandit Feedback
- URL: http://arxiv.org/abs/2305.00889v1
- Date: Mon, 1 May 2023 15:48:34 GMT
- Title: The Impact of the Geometric Properties of the Constraint Set in Safe
Optimization with Bandit Feedback
- Authors: Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh
- Abstract summary: We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment.
We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm.
- Score: 5.758073912084366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a safe optimization problem with bandit feedback in which an
agent sequentially chooses actions and observes responses from the environment,
with the goal of maximizing an arbitrary function of the response while
respecting stage-wise constraints. We propose an algorithm for this problem,
and study how the geometric properties of the constraint set impact the regret
of the algorithm. In order to do so, we introduce the notion of the sharpness
of a particular constraint set, which characterizes the difficulty of
performing learning within the constraint set in an uncertain setting. This
concept of sharpness allows us to identify the class of constraint sets for
which the proposed algorithm is guaranteed to enjoy sublinear regret.
Simulation results for this algorithm support the sublinear regret bound and
provide empirical evidence that the sharpness of the constraint set impacts the
performance of the algorithm.
Related papers
- A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - Resilient Constrained Reinforcement Learning [87.4374430686956]
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before study.
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward training objective and the constraint satisfaction.
We propose a new constrained RL approach that searches for policy and constraint specifications together.
arXiv Detail & Related papers (2023-12-28T18:28:23Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Directional Optimism for Safe Linear Bandits [4.84955052297084]
The safe linear bandit problem is a version of the classical linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds.
We find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets.
Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.
arXiv Detail & Related papers (2023-08-29T03:54:53Z) - Primal-Dual Contextual Bayesian Optimization for Control System Online
Optimization with Time-Average Constraints [21.38692458445459]
This paper studies the problem of online performance optimization of constrained closed-loop control systems.
A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution.
arXiv Detail & Related papers (2023-04-12T18:37:52Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Concave Utility Reinforcement Learning with Zero-Constraint Violations [43.29210413964558]
We consider the problem of concave utility reinforcement learning (CURL) with convex constraints.
We propose a model-based learning algorithm that also achieves zero constraint violations.
arXiv Detail & Related papers (2021-09-12T06:13:33Z) - Screening Rules and its Complexity for Active Set Identification [16.762870396299334]
We show that screening rules stem from a combination of natural properties of subdifferential sets and optimality conditions.
Under mild assumptions, we analyze the number of iterations needed to identify the optimal active set for any converging algorithm.
arXiv Detail & Related papers (2020-09-06T11:10:34Z) - Constrained episodic reinforcement learning in concave-convex and
knapsack settings [81.08055425644037]
We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints.
Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
arXiv Detail & Related papers (2020-06-09T05:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.