Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback
- URL: http://arxiv.org/abs/2503.12285v1
- Date: Sat, 15 Mar 2025 22:52:27 GMT
- Title: Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback
- Authors: Vaneet Aggarwal, Shweta Jain, Subham Pokhriyal, Christopher John Quinn,
- Abstract summary: We study bi-criteria optimization for multi-armed bandits (CMAB) with bandit feedback.<n>We propose a general framework that transforms discrete bi-linear offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation guarantees.<n>These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.
- Score: 27.613888121859393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study bi-criteria optimization for combinatorial multi-armed bandits (CMAB) with bandit feedback. We propose a general framework that transforms discrete bi-criteria offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation (CCV) guarantees. Our framework requires the offline algorithm to provide an $(\alpha, \beta)$-bi-criteria approximation ratio with $\delta$-resilience and utilize $\texttt{N}$ oracle calls to evaluate the objective and constraint functions. We prove that the proposed framework achieves sub-linear regret and CCV, with both bounds scaling as ${O}\left(\delta^{2/3} \texttt{N}^{1/3}T^{2/3}\log^{1/3}(T)\right)$. Crucially, the framework treats the offline algorithm with $\delta$-resilience as a black box, enabling flexible integration of existing approximation algorithms into the CMAB setting. To demonstrate its versatility, we apply our framework to several combinatorial problems, including submodular cover, submodular cost covering, and fair submodular maximization. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.
Related papers
- Stochastic $k$-Submodular Bandits with Full Bandit Feedback [29.705337940879705]
We present the first sublinear $alpha$-regret bounds for online $k$-submodular optimization problems with full-bandit feedback.<n>A key contribution of our work is analyzing the robustness of the algorithms.
arXiv Detail & Related papers (2024-12-14T05:02:53Z) - Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Unified Projection-Free Algorithms for Adversarial DR-Submodular Optimization [28.598226670015315]
This paper introduces unified projection-free Frank-Wolfe type algorithms for adversarial DR-submodular optimization.
For every problem considered in the non-monotone setting, the proposed algorithms are either the first with proven sub-linear $alpha$-regret bounds or have better $alpha$-regret bounds than the state of the art.
arXiv Detail & Related papers (2024-03-15T07:05:44Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - A Framework for Adapting Offline Algorithms to Solve Combinatorial
Multi-Armed Bandit Problems with Bandit Feedback [27.192028744078282]
We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods.
The proposed framework is applied to diverse applications in submodular horizon.
arXiv Detail & Related papers (2023-01-30T23:18:06Z) - Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Submodular Bandit Problem Under Multiple Constraints [8.100450025624443]
We introduce a submodular bandit problem under the intersection of $l$ knapsacks and a $k$-system constraint.
To solve this problem, we propose a non-greedy algorithm that adaptively focuses on a standard or modified upper-confidence bound.
We provide a high-probability upper bound of an approximation regret, where the approximation ratio matches that of a fast algorithm.
arXiv Detail & Related papers (2020-06-01T01:28:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.