Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates
- URL: http://arxiv.org/abs/2503.16941v1
- Date: Fri, 21 Mar 2025 08:33:28 GMT
- Title: Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates
- Authors: Wenjia Wang, Qingwen Zhang, Xiaowei Zhang,
- Abstract summary: We develop a contextual bandit algorithm based on sparse additive reward models in reproducing kernel Hilbert spaces.<n>We establish statistical properties of the doubly penalized method applied to random regions, introducing novel analyses under bandit feedback.
- Score: 10.04289564826712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized services are central to today's digital landscape, where online decision-making is commonly formulated as contextual bandit problems. Two key challenges emerge in modern applications: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate relationships. We address these challenges by developing a contextual bandit algorithm based on sparse additive reward models in reproducing kernel Hilbert spaces. We establish statistical properties of the doubly penalized method applied to random regions, introducing novel analyses under bandit feedback. Our algorithm achieves sublinear cumulative regret over the time horizon $T$ while scaling logarithmically with covariate dimensionality $d$. Notably, we provide the first regret upper bound with logarithmic growth in $d$ for nonparametric contextual bandits with high-dimensional covariates. We also establish a lower bound, with the gap to the upper bound vanishing as smoothness increases. Extensive numerical experiments demonstrate our algorithm's superior performance in high-dimensional settings compared to existing approaches.
Related papers
- High-dimensional Linear Bandits with Knapsacks [8.862707047517913]
We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large.
We develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner.
We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension.
arXiv Detail & Related papers (2023-11-02T15:40:33Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - PopArt: Efficient Sparse Regression and Experimental Design for Optimal
Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt.
We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - High-Dimensional Sparse Linear Bandits [67.9378546011416]
We derive a novel $Omega(n2/3)$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime.
We also prove a dimension-free $O(sqrtn)$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.
arXiv Detail & Related papers (2020-11-08T16:48:11Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.