Related papers: Budget-Constrained Bandits over General Cost and Reward Distributions

Budget-Constrained Bandits over General Cost and Reward Distributions

URL: http://arxiv.org/abs/2003.00365v1
Date: Sat, 29 Feb 2020 23:50:08 GMT
Title: Budget-Constrained Bandits over General Cost and Reward Distributions
Authors: Semih Cayci, Atilla Eryilmaz, R. Srikant
Abstract summary: We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. We show that if moments of order $(2+gamma)$ for some $gamma > 0$ exist for all cost-reward pairs, $O(log B)$ regret is achievable for a budget $B>0$.
Score: 32.63624728528415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Related papers

A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z)
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z)
Confounded Budgeted Causal Bandits [28.199741662190203]
We study the problem of learning 'good' interventions in an environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. We propose an algorithm to minimize the cumulative regret in general causal graphs.
arXiv Detail & Related papers (2024-01-15T10:26:13Z)
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness [1.6741394365746018]
We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. We introduce a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of $sqrtT$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2023-05-25T07:41:35Z)
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits [8.508198765617198]
We propose a novel contextual bandit algorithm for generalized linear rewards with an $tildeO(sqrtkappa-1 phi T)$ regret over $T$ rounds. We also provide an $O(kappa-1 phi log (NT) log T)$ regret bound for $N$ arms under a probabilistic margin condition.
arXiv Detail & Related papers (2022-09-15T00:20:38Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model. It is assumed that the graph's structure is known and has $N$ nodes. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z)
A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback [22.17503016665027]
We consider problems where each action returns a random reward, cost, and penalty from an unknown joint distribution. We propose a novel low-complexity algorithm, named $tt LyOn$, and prove that it achieves $O(sqrtBlog B)$ regret and $O(log B/B)$ constraint-violation. The low computational cost bounds of $tt LyOn$ suggest that Lyapunov-based algorithm design methodology can be effective in solving constrained bandit optimization problems.
arXiv Detail & Related papers (2021-06-09T16:12:07Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.