Related papers: Linear Bandits with Non-i.i.d. Noise

Linear Bandits with Non-i.i.d. Noise

URL: http://arxiv.org/abs/2505.20017v2
Date: Tue, 27 May 2025 09:24:30 GMT
Title: Linear Bandits with Non-i.i.d. Noise
Authors: Baptiste Abélès, Eugenio Clerico, Hamish Flynn, Gergely Neu,
Abstract summary: We study the linear bandit problem, relaxing the standard i.i.d. assumption on the observation noise.<n>As an alternative to this restrictive assumption, we allow the noise terms across rounds to be sub-Gaussian but interdependent.<n>We derive a bandit algorithm based on the principle of optimism in the face of uncertainty.
Score: 8.937169040399775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the linear stochastic bandit problem, relaxing the standard i.i.d. assumption on the observation noise. As an alternative to this restrictive assumption, we allow the noise terms across rounds to be sub-Gaussian but interdependent, with dependencies that decay over time. To address this setting, we develop new confidence sequences using a recently introduced reduction scheme to sequential probability assignment, and use these to derive a bandit algorithm based on the principle of optimism in the face of uncertainty. We provide regret bounds for the resulting algorithm, expressed in terms of the decay rate of the strength of dependence between observations. Among other results, we show that our bounds recover the standard rates up to a factor of the mixing time for geometrically mixing observation noise.

Related papers

Regularized least squares learning with heavy-tailed noise is minimax optimal [23.16162334363017]
This paper examines the performance of ridge regression in kernel Hilbert spaces in the presence of noise that exhibits a finite number of reproducing higher moments.<n>We establish risk bounds consisting of subgaussian and excess terms based on the well known integral operator framework.
arXiv Detail & Related papers (2025-05-20T11:17:54Z)
Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits [0.0]
We introduce a novel frequentist IDS algorithm that iteratively refines a high-probability upper bound on the true parameter norm using accumulating data.<n>We establish regret bounds for our algorithm that do not depend on an initially assumed parameter norm bound and demonstrate that our method outperforms state-of-the-art IDS and UCB algorithms.
arXiv Detail & Related papers (2025-03-07T02:33:37Z)
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates [61.091122503406304]
We show that the gradient bandit algorithm converges to a globally optimal policy almost surely using emphany constant learning rate.<n>This result demonstrates that gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down.
arXiv Detail & Related papers (2025-02-11T00:12:04Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Convergent regularization in inverse problems and linear plug-and-play denoisers [3.759634359597638]
Plug-and-play () denoising is a popular framework for solving imaging problems using inverse image denoisers. Not much is known about the properties of the converged solution as the noise level in the measurement vanishes to zero, i.e. whether provably convergent regularization schemes are provably convergent regularization schemes. We show that with linear denoisers, the implicit regularization of the denoiser to an explicit regularization functional leads to a convergent regularization scheme.
arXiv Detail & Related papers (2023-07-18T17:16:08Z)
Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation [50.85788484752612]
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used.
arXiv Detail & Related papers (2021-10-21T16:57:45Z)
Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z)
Randomized benchmarking in the presence of time-correlated dephasing noise [0.0]
A typical randomized benchmarking procedure identifies the exponential decay in the fidelity as the benchmarking sequence of gates increases in length. The fidelity decays exponentially, however, relies on the assumption of time-independent or static noise in the gates. The precise mechanisms for deviation have yet to be fully explored.
arXiv Detail & Related papers (2020-10-22T07:33:41Z)
Estimating means of bounded random variables by betting [39.98103047898974]
This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization and improvement of the celebrated Chernoff method. We show how to extend these ideas to sampling without replacement, another heavily studied problem.
arXiv Detail & Related papers (2020-10-19T17:22:03Z)
Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics. This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.