Sparse Nonparametric Contextual Bandits
- URL: http://arxiv.org/abs/2503.16382v1
- Date: Thu, 20 Mar 2025 17:44:56 GMT
- Title: Sparse Nonparametric Contextual Bandits
- Authors: Hamish Flynn, Julia Olkhovskaya, Paul Rognon-Vael,
- Abstract summary: We study the problem of simultaneously learning relevant features and minimising regret in contextual bandit problems.<n>We introduce and analyse a new class of contextual bandit problems, called sparse nonparametric contextual bandits.<n>We find that sparsity always enables better regret bounds, as long as the horizon is large enough relative to the sparsity and the number of actions.
- Score: 2.0072624123275533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the problem of simultaneously learning relevant features and minimising regret in contextual bandit problems. We introduce and analyse a new class of contextual bandit problems, called sparse nonparametric contextual bandits, in which the expected reward function lies in the linear span of a small unknown set of features that belongs to a known infinite set of candidate features. We consider two notions of sparsity, for which the set of candidate features is either countable or uncountable. Our contribution is two-fold. First, we provide lower bounds on the minimax regret, which show that polynomial dependence on the number of actions is generally unavoidable in this setting. Second, we show that a variant of the Feel-Good Thompson Sampling algorithm enjoys regret bounds that match our lower bounds up to logarithmic factors of the horizon, and have logarithmic dependence on the effective number of candidate features. When we apply our results to kernelised and neural contextual bandits, we find that sparsity always enables better regret bounds, as long as the horizon is large enough relative to the sparsity and the number of actions.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - A conversion theorem and minimax optimality for continuum contextual bandits [70.71582850199871]
We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set.<n>The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret.<n>We show that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret.
arXiv Detail & Related papers (2024-06-09T10:12:08Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Contextual Multinomial Logit Bandits with General Value Functions [47.06746975822902]
Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising.
We consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits.
Specifically, we consider both the computation and the adversarial settings, and propose a suite of algorithms, each with different-regret trade-off.
arXiv Detail & Related papers (2024-02-12T23:50:44Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear
Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context.
We show that the contextual problem can be solved as a linear bandit problem.
Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z) - A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit [1.2183405753834562]
This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one.
We obtain the leading order terms of the minmax optimal regret and pseudoregret for this problem by associating each of them with a solution of a linear heat equation.
arXiv Detail & Related papers (2022-02-11T17:03:18Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Sequential Batch Learning in Finite-Action Linear Contextual Bandits [40.01661188919779]
We study the sequential batch learning problem in linear contextual bandits with finite action sets.
This problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications.
arXiv Detail & Related papers (2020-04-14T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.