No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!
- URL: http://arxiv.org/abs/2506.13244v3
- Date: Wed, 18 Jun 2025 14:04:08 GMT
- Title: No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!
- Authors: Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Christian Kroer,
- Abstract summary: We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
- Score: 56.80767500991973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online decision making problems under resource constraints, where both reward and cost functions are drawn from distributions that may change adversarially over time. We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback. It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time. To address this challenge, we analyze a framework in which the learner is guided by a spending plan--a sequence prescribing expected resource usage across rounds. We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget across rounds. We additionally provide a robust variant of our methods to handle worst-case scenarios where the spending plan is highly imbalanced. To conclude, we study the regret of our algorithms when competing against benchmarks that deviate from the prescribed spending plan.
Related papers
- Learning to Price with Resource Constraints: From Full Information to Machine-Learned Prices [13.68761797358598]
We study the dynamic pricing problem with knapsack, addressing the challenge of balancing exploration and exploitation under resource constraints.<n>We introduce three algorithms tailored to different informational settings: a Boundary Attracted Re-solve Method for full information, an online learning algorithm for scenarios with no prior information, and an estimate-then-select re-solve algorithm that leverages machine-learned informed prices with known upper bound of estimation errors.
arXiv Detail & Related papers (2025-01-24T00:46:52Z) - Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target.<n>In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions.<n>The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an horizon regime characterized by a large target number of facilities.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - Online Learning with Costly Features in Non-stationary Environments [6.009759445555003]
In sequential decision-making problems, maximizing long-term rewards is the primary goal.
In real-world problems, collecting beneficial information is often costly.
We develop an algorithm that guarantees a sublinear regret in time.
arXiv Detail & Related papers (2023-07-18T16:13:35Z) - Online Resource Allocation in Episodic Markov Decision Processes [1.8416014644193066]
We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process.
We propose the observe-then-decide regime and improve the existing decide-then-observe regime.
We show that the regret against the static optimal policy that has access to the mean reward and mean resource consumption functions is bounded by $tilde O(rho-1H3/2SsqrtAT)$ with high probability.
arXiv Detail & Related papers (2023-05-18T06:28:34Z) - Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected.
This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs.
We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Dynamic Planning and Learning under Recovering Rewards [18.829837839926956]
We propose, construct and prove performance guarantees for a class of "Purely Periodic Policies"
Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications.
arXiv Detail & Related papers (2021-06-28T15:40:07Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Online Stochastic Optimization with Wasserstein Based Non-stationarity [12.91020811577007]
We consider a general online optimization problem with multiple budget constraints over a horizon of finite time periods.
The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints.
This formulation captures a wide range of applications including online linear programming and network revenue management.
arXiv Detail & Related papers (2020-12-13T04:47:37Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.