Related papers: High-dimensional Linear Bandits with Knapsacks

High-dimensional Linear Bandits with Knapsacks

URL: http://arxiv.org/abs/2311.01327v2
Date: Sat, 02 Aug 2025 07:21:42 GMT
Title: High-dimensional Linear Bandits with Knapsacks
Authors: Wanteng Ma, Dong Xia, Jiashuo Jiang,
Abstract summary: We develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner.<n>We show that either of the following structural assumptions is sufficient for a sharper regret bound of $tildeO(s_0 sqrtT)$.<n>As a by-product, applying our framework to high-dimensional contextual bandits without knapsack constraints recovers the optimal regret rates in both the data-poor and data-rich regimes.
Score: 7.8856737627153874
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the contextual bandits with knapsack (CBwK) problem in a high-dimensional linear setting, where the feature dimension can be very large. Our goal is to harness sparsity to obtain sharper regret guarantees. To this end, we first develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner. We then embed this estimator in a primal-dual scheme: every knapsack constraint is paired with a dual variable, which is updated by an online learning rule to keep the cumulative resource consumption within budget. This integrated approach achieves a two-phase sub-linear regret that scales only logarithmically with the feature dimension, improving on the polynomial dependency reported in prior work. Furthermore, we show that either of the following structural assumptions is sufficient for a sharper regret bound of $\tilde{O}(s_{0} \sqrt{T})$: (i) a diverse-covariate condition; and (ii) a margin condition. When both conditions hold simultaneously, we can further control the regret to $O(s_{0}^{2} \log(dT)\log T)$ by a dual resolving scheme. As a by-product, applying our framework to high-dimensional contextual bandits without knapsack constraints recovers the optimal regret rates in both the data-poor and data-rich regimes. Finally, numerical experiments confirm the empirical efficiency of our algorithms in high-dimensional settings.

Related papers

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration [6.287267171078442]
We propose variance-aware algorithms that leverage neural networks to approximate nonlinear utility functions.<n>We establish theoretical guarantees showing that our algorithms achieve sublinear cumulative average regret of order $bigollt(d sqrtsum_t=1T sigma_t2 + sqrtdTrt),$ for sufficiently wide neural networks.
arXiv Detail & Related papers (2025-06-02T01:58:48Z)
Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates [10.04289564826712]
We develop a contextual bandit algorithm based on sparse additive reward models in reproducing kernel Hilbert spaces. We establish statistical properties of the doubly penalized method applied to random regions, introducing novel analyses under bandit feedback.
arXiv Detail & Related papers (2025-03-21T08:33:28Z)
Sparse Linear Bandits with Blocking Constraints [22.01704171400845]
We investigate the high-dimensional sparse linear bandits problem in a data-poor regime.<n>We show novel offline statistical guarantees of the lasso estimator for the linear model.<n>We propose a meta-algorithm based on corralling that does not need knowledge of optimal sparsity parameter $k$ at minimal cost to regret.
arXiv Detail & Related papers (2024-10-26T01:42:03Z)
FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits [8.908421753758475]
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems. Motivated by data privacy concerns, we study the joint differentially private high dimensional sparse linear bandits. We show that FLIPHAT achieves optimal regret in terms of privacy parameters.
arXiv Detail & Related papers (2024-05-22T22:19:12Z)
No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization [26.415300249303748]
We show that it is possible to circumvent the issue of sublinear violations of constraints by requiring the primal and dual algorithm to be weakly adaptive. In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $rho/(1+rho)$. This results allow us to obtain new result for the problem of contextual bandits with linear constraints.
arXiv Detail & Related papers (2024-05-10T16:22:33Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression [65.8785736964253]
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions.
arXiv Detail & Related papers (2022-11-14T16:08:44Z)
PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt. We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z)
Meta-Learning Adversarial Bandits [49.094361442409785]
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a meta-algorithm that setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit optimization (BLO) Our guarantees rely on proving that unregularized follow-the-leader combined with multiplicative weights is enough to online learn a non-smooth and non-B sequence.
arXiv Detail & Related papers (2022-05-27T17:40:32Z)
First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces. We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
On Learning to Rank Long Sequences with Contextual Bandits [17.97356309346139]
We introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature.
arXiv Detail & Related papers (2021-06-07T12:16:34Z)
The Symmetry between Bandits and Knapsacks: A Primal-Dual LP-based Approach [15.626602292752624]
We develop a primal-dual based algorithm that achieves a problem-dependent logarithmic regret bound. The sub-optimality measure highlights the important role of knapsacks in determining regret. This is the first problem-dependent logarithmic regret bound for solving the general BwK problem.
arXiv Detail & Related papers (2021-02-12T08:14:30Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.