Online Learning with Knapsacks: the Best of Both Worlds
- URL: http://arxiv.org/abs/2202.13710v1
- Date: Mon, 28 Feb 2022 12:10:48 GMT
- Title: Online Learning with Knapsacks: the Best of Both Worlds
- Authors: Matteo Castiglioni, Andrea Celli, Christian Kroer
- Abstract summary: We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
- Score: 54.28273783164608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online learning problems in which a decision maker wants to maximize
their expected reward without violating a finite set of $m$ resource
constraints. By casting the learning process over a suitably defined space of
strategy mixtures, we recover strong duality on a Lagrangian relaxation of the
underlying optimization problem, even for general settings with non-convex
reward and resource-consumption functions. Then, we provide the first
best-of-both-worlds type framework for this setting, with no-regret guarantees
both under stochastic and adversarial inputs. Our framework yields the same
regret guarantees of prior work in the stochastic case. On the other hand, when
budgets grow at least linearly in the time horizon, it allows us to provide a
constant competitive ratio in the adversarial case, which improves over the
$O(m \log T)$ competitive ratio of Immorlica at al. (2019). Moreover, our
framework allows the decision maker to handle non-convex reward and cost
functions. We provide two game-theoretic applications of our framework to give
further evidence of its flexibility.
Related papers
- Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes.
Despite theoretically appealing, the application of dominance in machine learning has been scarce.
We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables.
We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z) - Bandits with Replenishable Knapsacks: the Best of both Worlds [26.786438972889208]
We study a natural generalization of the BwK framework which allows non-monotonic resource utilization.
Under an input model, our algorithm yields an instance-independent $tildeO(T1/2)$ regret bound.
arXiv Detail & Related papers (2023-06-14T12:34:00Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Stochastic Compositional Optimization with Compositional Constraints [8.535822787583486]
We study a novel model that incorporates single-level expected value and two-level compositional constraints into the current SCO framework.
Our model can be applied widely to data-driven optimization and risk management, including risk-averse optimization and high-moment portfolio selection.
We propose a class of primal-dual algorithms that generates sequences converging to the optimal solution at the rate of $cO(frac1sqrtN)$under both single-level expected value and two-level compositional constraints.
arXiv Detail & Related papers (2022-09-09T02:06:35Z) - Best of Both Worlds Model Selection [39.211071446838474]
We study the problem of model selection in bandit scenarios in the presence of nested policy classes.
Our approach requires that each base learner comes with a candidate regret bound that may or may not hold.
These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios.
arXiv Detail & Related papers (2022-06-29T20:57:30Z) - Efficient Optimistic Exploration in Linear-Quadratic Regulators via
Lagrangian Relaxation [107.06364966905821]
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting.
Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of ofulq.
We show that an $epsilon$-optimistic controller can be computed efficiently by solving at most $Obig(log (1/epsilon)big)$ Riccati equations.
arXiv Detail & Related papers (2020-07-13T16:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.