Related papers: Exploration via Feature Perturbation in Contextual Bandits

Exploration via Feature Perturbation in Contextual Bandits

URL: http://arxiv.org/abs/2510.17390v2
Date: Fri, 24 Oct 2025 08:30:06 GMT
Title: Exploration via Feature Perturbation in Contextual Bandits
Authors: Seouh-won Yi, Min-hwan Oh,
Abstract summary: We propose a simple strategy for contextual bandits that injects randomness directly into feature inputs.<n>Remarkably, this algorithm achieves $tildemathcalO(dsqrtT)$ worst-case regret bound for generalized linear contextual bandits.
Score: 33.46701416812218
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\tilde{\mathcal{O}}(d\sqrt{T})$ worst-case regret bound for generalized linear contextual bandits, while avoiding the $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.

Related papers

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits [32.5817931126341]
We study contextual logistic bandits under the simple regret objective.<n>We propose the first algorithm that achieves simple regret $tildemathcalO(d/sqrtT)$.<n>We also introduce a new variant of Thompson Sampling tailored to the simple-regret setting.
arXiv Detail & Related papers (2026-01-29T02:09:13Z)
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions [8.48717433940334]
We introduce a new problem of generalized linear bandits with unknown reward functions, also known as single index bandits.<n>We first consider the case where the unknown reward function is monotonically increasing, and propose two novel and efficient algorithms, STOR and ESTOR.<n>We then extend our methods to the high-dimensional sparse setting and show that the same regret rate can be attained with the sparsity index.
arXiv Detail & Related papers (2025-06-15T07:19:00Z)
Efficient kernelized bandit algorithms via exploration distributions [13.86858382375188]
We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic.<n>We show that our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $tildeO(gamma_TsqrtT)$ regret bounds.
arXiv Detail & Related papers (2025-06-11T18:23:43Z)
Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$. We first design novel efficient algorithms to privately release pair-wise cosine similarities. We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z)
Semi-Bandit Learning for Monotone Stochastic Optimization [16.921694787482213]
A generic online learning algorithm is developed for a class of ''monotone'' problems.<n>Our framework applies to several fundamental problems such as prophet inequality, Pandora's box, single-resource revenue management and posted pricing.
arXiv Detail & Related papers (2023-12-24T07:46:37Z)
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z)
Semi-Random Sparse Recovery in Nearly-Linear Time [37.61139884826181]
We investigate the brittleness of fast sparse recovery algorithms to generative model changes. Our approach differs from prior fast iterative methods with provable guarantees under semi-random generative models. We design a new iterative method tailored to the geometry of sparse recovery which is provably robust to our semi-random model.
arXiv Detail & Related papers (2022-03-08T10:56:46Z)
Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions. We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z)
Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off. We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Sparsity-Agnostic Lasso Bandit [27.383079108028074]
We consider a contextual bandit problem where the dimension $d$ of the feature vectors is potentially large. All existing algorithms for sparse bandits require a priori knowledge of the value of the sparsity index $s_$. We propose an algorithm that does not require prior knowledge of the sparsity index $s_$ and establish tight regret bounds on its performance under mild conditions.
arXiv Detail & Related papers (2020-07-16T17:24:12Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms. We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.