High-dimensional Linear Bandits with Knapsacks
- URL: http://arxiv.org/abs/2311.01327v1
- Date: Thu, 2 Nov 2023 15:40:33 GMT
- Title: High-dimensional Linear Bandits with Knapsacks
- Authors: Wanteng Ma, Dong Xia and Jiashuo Jiang
- Abstract summary: We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large.
We develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner.
We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension.
- Score: 8.862707047517913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the contextual bandits with knapsack (CBwK) problem under the
high-dimensional setting where the dimension of the feature is large. The
reward of pulling each arm equals the multiplication of a sparse
high-dimensional weight vector and the feature of the current arrival, with
additional random noise. In this paper, we investigate how to exploit this
sparsity structure to achieve improved regret for the CBwK problem. To this
end, we first develop an online variant of the hard thresholding algorithm that
performs the sparse estimation in an online manner. We further combine our
online estimator with a primal-dual framework, where we assign a dual variable
to each knapsack constraint and utilize an online learning algorithm to update
the dual variable, thereby controlling the consumption of the knapsack
capacity. We show that this integrated approach allows us to achieve a
sublinear regret that depends logarithmically on the feature dimension, thus
improving the polynomial dependency established in the previous literature. We
also apply our framework to the high-dimension contextual bandit problem
without the knapsack constraint and achieve optimal regret in both the
data-poor regime and the data-rich regime. We finally conduct numerical
experiments to show the efficient empirical performance of our algorithms under
the high dimensional setting.
Related papers
- Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption.
We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z) - PopArt: Efficient Sparse Regression and Experimental Design for Optimal
Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt.
We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z) - Meta-Learning Adversarial Bandits [49.094361442409785]
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure.
As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO)
Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
arXiv Detail & Related papers (2022-05-27T17:40:32Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - On Learning to Rank Long Sequences with Contextual Bandits [17.97356309346139]
We introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses.
Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature.
arXiv Detail & Related papers (2021-06-07T12:16:34Z) - The Symmetry between Bandits and Knapsacks: A Primal-Dual LP-based
Approach [15.626602292752624]
We develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound.
The sub-optimality measure highlights the important role of knapsacks in determining regret.
This is the first problem-dependent logarithmic regret bound for solving the general BwK problem.
arXiv Detail & Related papers (2021-02-12T08:14:30Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.