A Unifying Framework for Online Optimization with Long-Term Constraints
- URL: http://arxiv.org/abs/2209.07454v1
- Date: Thu, 15 Sep 2022 16:59:19 GMT
- Title: A Unifying Framework for Online Optimization with Long-Term Constraints
- Authors: Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Giulia Romano,
Nicola Gatti
- Abstract summary: We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
- Score: 62.35194099438855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online learning problems in which a decision maker has to take a
sequence of decisions subject to $m$ long-term constraints. The goal of the
decision maker is to maximize their total reward, while at the same time
achieving small cumulative constraints violation across the $T$ rounds. We
present the first best-of-both-world type algorithm for this general class of
problems, with no-regret guarantees both in the case in which rewards and
constraints are selected according to an unknown stochastic model, and in the
case in which they are selected at each round by an adversary. Our algorithm is
the first to provide guarantees in the adversarial setting with respect to the
optimal fixed strategy that satisfies the long-term constraints. In particular,
it guarantees a $\rho/(1+\rho)$ fraction of the optimal reward and sublinear
regret, where $\rho$ is a feasibility parameter related to the existence of
strictly feasible solutions. Our framework employs traditional regret
minimizers as black-box components. Therefore, by instantiating it with an
appropriate choice of regret minimizers it can handle the full-feedback as well
as the bandit-feedback setting. Moreover, it allows the decision maker to
seamlessly handle scenarios with non-convex rewards and constraints. We show
how our framework can be applied in the context of budget-management mechanisms
for repeated auctions in order to guarantee long-term constraints that are not
packing (e.g., ROI constraints).
Related papers
- Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization [26.415300249303748]
We show that it is possible to circumvent the issue of sublinear violations of constraints by requiring the primal and dual algorithm to be weakly adaptive.
In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $rho/(1+rho)$.
This results allow us to obtain new result for the problem of contextual bandits with linear constraints.
arXiv Detail & Related papers (2024-05-10T16:22:33Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - A Lyapunov-Based Methodology for Constrained Optimization with Bandit
Feedback [22.17503016665027]
We consider problems where each action returns a random reward, cost, and penalty from an unknown joint distribution.
We propose a novel low-complexity algorithm, named $tt LyOn$, and prove that it achieves $O(sqrtBlog B)$ regret and $O(log B/B)$ constraint-violation.
The low computational cost bounds of $tt LyOn$ suggest that Lyapunov-based algorithm design methodology can be effective in solving constrained bandit optimization problems.
arXiv Detail & Related papers (2021-06-09T16:12:07Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Model-Free Algorithm and Regret Analysis for MDPs with Long-Term
Constraints [38.2783003051101]
This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints.
We note that these are the first results on regret analysis for MDP with long-term constraints, where the transition probabilities are not known apriori.
arXiv Detail & Related papers (2020-06-10T17:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.