The Symmetry between Bandits and Knapsacks: A Primal-Dual LP-based
Approach
- URL: http://arxiv.org/abs/2102.06385v1
- Date: Fri, 12 Feb 2021 08:14:30 GMT
- Title: The Symmetry between Bandits and Knapsacks: A Primal-Dual LP-based
Approach
- Authors: Xiaocheng Li, Chunlin Sun, Yinyu Ye
- Abstract summary: We develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound.
The sub-optimality measure highlights the important role of knapsacks in determining regret.
This is the first problem-dependent logarithmic regret bound for solving the general BwK problem.
- Score: 15.626602292752624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the bandits with knapsacks (BwK) problem and develop
a primal-dual based algorithm that achieves a problem-dependent logarithmic
regret bound. The BwK problem extends the multi-arm bandit (MAB) problem to
model the resource consumption associated with playing each arm, and the
existing BwK literature has been mainly focused on deriving asymptotically
optimal distribution-free regret bounds. We first study the primal and dual
linear programs underlying the BwK problem. From this primal-dual perspective,
we discover symmetry between arms and knapsacks, and then propose a new notion
of sub-optimality measure for the BwK problem. The sub-optimality measure
highlights the important role of knapsacks in determining algorithm regret and
inspires the design of our two-phase algorithm. In the first phase, the
algorithm identifies the optimal arms and the binding knapsacks, and in the
second phase, it exhausts the binding knapsacks via playing the optimal arms
through an adaptive procedure. Our regret upper bound involves the proposed
sub-optimality measure and it has a logarithmic dependence on length of horizon
$T$ and a polynomial dependence on $m$ (the numbers of arms) and $d$ (the
number of knapsacks). To the best of our knowledge, this is the first
problem-dependent logarithmic regret bound for solving the general BwK problem.
Related papers
- Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
This work is motivated by the growing demand for reproducible machine learning.
In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - High-dimensional Linear Bandits with Knapsacks [8.862707047517913]
We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large.
We develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner.
We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension.
arXiv Detail & Related papers (2023-11-02T15:40:33Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Complexity Analysis of a Countable-armed Bandit Problem [9.163501953373068]
We study the classical problem of minimizing the expected cumulative regret over a horizon of play $n$.
We propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of $mathcalOleft( log n right)$ when $K=2$.
While the order of regret and complexity of the problem suggests a great degree of similarity to the classical MAB problem, properties of the performance bounds and salient aspects of algorithm design are quite distinct from the latter.
arXiv Detail & Related papers (2023-01-18T00:53:46Z) - Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - Non-stationary Bandits with Knapsacks [6.2006721306998065]
We study the problem of bandits with knapsacks (BwK) in a non-stationary environment.
We employ both non-stationarity measures to derive upper and lower bounds for the problem.
arXiv Detail & Related papers (2022-05-25T01:22:36Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Recurrent Submodular Welfare and Matroid Blocking Bandits [22.65352007353614]
A recent line of research focuses on the study of the multi-armed bandits problem (MAB)
We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid.
A key ingredient is the technique of correlated (interleaved) scheduling.
arXiv Detail & Related papers (2021-01-30T21:51:47Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.