Related papers: Bandits with Anytime Knapsacks

Bandits with Anytime Knapsacks

URL: http://arxiv.org/abs/2501.18560v1
Date: Thu, 30 Jan 2025 18:36:13 GMT
Title: Bandits with Anytime Knapsacks
Authors: Eray Can Elumar, Cem Tekin, Osman Yagan,
Abstract summary: We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms.<n>We show that SUAK attains the same problem-dependent regret upper bound of $ O(K log T)$ established in prior work under the simpler BwK framework.
Score: 14.69261466438399
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

Related papers

On the Problem of Best Arm Retention [2.9772441757190746]
We study the problem of Best Arm Retention (BAR), which has recently found applications in streaming algorithms for multi-armed bandits. We first investigate pure exploration for the BAR problem under different criteria, and then minimize the regret with specific constraints.
arXiv Detail & Related papers (2025-04-16T08:41:20Z)
A New Benchmark for Online Learning with Budget-Balancing Constraints [14.818946657685267]
We present a new benchmark to compare against, motivated both by real-world applications such as autobidding and by its underlying mathematical structure. We show that sublinear regret is attainable against any strategy whose spending pattern is within $o(T2)$ of any sub-linear spending pattern.
arXiv Detail & Related papers (2025-03-19T00:14:20Z)
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits [1.4732811715354452]
We consider a $K$-armed bandit problem in which each arm consumes a different amount of resources when selected. We propose a series of algorithms that are randomized like Thompson Sampling but more carefully optimize their decisions with respect to the budget constraint.
arXiv Detail & Related papers (2024-08-28T04:56:06Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness [1.6741394365746018]
We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. We introduce a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of $sqrtT$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2023-05-25T07:41:35Z)
Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits [81.60136088841948]
We propose an algorithm that minimizes the regret over the horizon of time $T$. The proposed algorithm is applicable to domains such as recommendation systems and transportation.
arXiv Detail & Related papers (2023-01-31T03:49:00Z)
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit. We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z)
A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z)
A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback [22.17503016665027]
We consider problems where each action returns a random reward, cost, and penalty from an unknown joint distribution. We propose a novel low-complexity algorithm, named $tt LyOn$, and prove that it achieves $O(sqrtBlog B)$ regret and $O(log B/B)$ constraint-violation. The low computational cost bounds of $tt LyOn$ suggest that Lyapunov-based algorithm design methodology can be effective in solving constrained bandit optimization problems.
arXiv Detail & Related papers (2021-06-09T16:12:07Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.