Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits
- URL: http://arxiv.org/abs/2305.06743v3
- Date: Tue, 26 Dec 2023 13:07:25 GMT
- Title: Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits
- Authors: Yuriy Dorn and Nikita Kornilov and Nikolay Kutuzov and Alexander Nazin
and Eduard Gorbunov and Alexander Gasnikov
- Abstract summary: Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
- Score: 85.27420062094086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
Related papers
- Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine
Constraints [21.38692458445459]
We study the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints.
A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case.
arXiv Detail & Related papers (2023-10-02T08:07:36Z) - Log-based Sparse Nonnegative Matrix Factorization for Data
Representation [55.72494900138061]
Nonnegative matrix factorization (NMF) has been widely studied in recent years due to its effectiveness in representing nonnegative data with parts-based representations.
We propose a new NMF method with log-norm imposed on the factor matrices to enhance the sparseness.
A novel column-wisely sparse norm, named $ell_2,log$-(pseudo) norm, is proposed to enhance the robustness of the proposed method.
arXiv Detail & Related papers (2022-04-22T11:38:10Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.