Related papers: Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis

URL: http://arxiv.org/abs/2401.11565v2
Date: Fri, 22 Mar 2024 21:33:47 GMT
Title: Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis
Authors: Sharu Theresa Jose, Shana Moothedath,
Abstract summary: We investigate a contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context. Our objective is to design an action policy that can approximate" that of an oracle.
Score: 4.297070083645049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore a stochastic contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context through a noise channel with an unknown noise parameter. Our objective is to design an action policy that can approximate" that of an oracle, which has access to the reward model, the channel parameter, and the predictive distribution of the true context from the observed noisy context. In a Bayesian framework, we introduce a Thompson sampling algorithm for Gaussian bandits with Gaussian context noise. Adopting an information-theoretic analysis, we demonstrate the Bayesian regret of our algorithm concerning the oracle's action policy. We also extend this problem to a scenario where the agent observes the true context with some delay after receiving the reward and show that delayed true contexts lead to lower Bayesian regret. Finally, we empirically demonstrate the performance of the proposed algorithms against baselines.

Related papers

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates [61.091122503406304]
We show that the gradient bandit algorithm converges to a globally optimal policy almost surely using emphany constant learning rate. This result demonstrates that gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down.
arXiv Detail & Related papers (2025-02-11T00:12:04Z)
Partially Observable Contextual Bandits with Linear Payoffs [18.593061465167363]
We consider a new bandit setting with partially observable, correlated contexts and linear payoffs. We propose an algorithmic pipeline named EMKF-Bandit, which integrates system identification, filtering, and classic contextual bandit algorithms.
arXiv Detail & Related papers (2024-09-17T19:47:04Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards [44.025369660607645]
We study the performance of the Thompson Sampling algorithm for Contextual Bandit problems. We introduce new bounds on the lifted information ratio that hold for sub-Gaussian rewards.
arXiv Detail & Related papers (2023-04-26T14:40:01Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits [1.8275108630751844]
We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits. We show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension.
arXiv Detail & Related papers (2021-10-23T08:51:49Z)
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem. Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals. We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
On Thompson Sampling for Smoother-than-Lipschitz Bandits [6.929312022493406]
We provide the first bounds on the regret of Thompson Sampling for continuum armed bandits under weak conditions. Our bounds are realised by analysis of the eluder dimension. We derive a new bound on the eluder dimension for classes of functions with Lipschitz derivatives.
arXiv Detail & Related papers (2020-01-08T00:46:13Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.