Related papers: Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

URL: http://arxiv.org/abs/2309.14298v2
Date: Wed, 27 Sep 2023 09:32:01 GMT
Title: Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures
Authors: Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters
Abstract summary: We present improved algorithms with worst-case regret guarantees for the linear bandit problem. We show that our confidence sequences are tighter than competitors, both empirically and theoretically.
Score: 26.683757807252675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present improved algorithms with worst-case regret guarantees for the stochastic linear bandit problem. The widely used "optimism in the face of uncertainty" principle reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. The performance of the resulting bandit algorithm depends on the size of the confidence sequence, with smaller confidence sets yielding better empirical performance and stronger regret guarantees. In this work, we use a novel tail bound for adaptive martingale mixtures to construct confidence sequences which are suitable for stochastic bandits. These confidence sequences allow for efficient action selection via convex programming. We prove that a linear bandit algorithm based on our confidence sequences is guaranteed to achieve competitive worst-case regret. We show that our confidence sequences are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter confidence sequences give improved performance in several hyperparameter tuning tasks.

Related papers

Tighter Confidence Bounds for Sequential Kernel Regression [3.683202928838613]
We use martingale tail inequalities to establish new confidence bounds for sequential kernel regression. Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical. We find that when our confidence bounds replace existing ones, the KernelUCB algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.
arXiv Detail & Related papers (2024-03-19T13:47:35Z)
Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z)
Huber-Robust Confidence Sequences [37.16361789841549]
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times. We show that the resulting confidence sequences attain the optimal width achieved in the nonsequential setting. Since confidence sequences are a common tool used within A/B/n testing and bandits, these results open the door to sequential experimentation that is robust to outliers and adversarial corruptions.
arXiv Detail & Related papers (2023-01-23T17:29:26Z)
SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness [61.212486108346695]
We propose a training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup. The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness. Our experimental results demonstrate that the proposed method can significantly improve the certified $ell$-robustness of smoothed classifiers.
arXiv Detail & Related papers (2021-11-17T18:20:59Z)
Open Problem: Tight Online Confidence Intervals for RKHS Elements [57.363123214464764]
We formalize the question of online confidence intervals in the RKHS setting and overview the existing results. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof.
arXiv Detail & Related papers (2021-10-28T22:36:20Z)
Off-policy Confidence Sequences [33.749904615295485]
We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency.
arXiv Detail & Related papers (2021-02-18T18:40:30Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
Near-Optimal Confidence Sequences for Bounded Random Variables [5.901337162013615]
A fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes. We provide a near-optimal confidence sequence for bounded random variables by utilizing Bentkus' concentration results. The resulting confidence sequence is confirmed to be favorable in both synthetic coverage problems and an application to adaptive stopping algorithms.
arXiv Detail & Related papers (2020-06-09T02:50:01Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
Bandit algorithms to emulate human decision making using probabilistic distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.