Related papers: Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits

Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits

URL: http://arxiv.org/abs/2405.15200v1
Date: Fri, 24 May 2024 04:11:58 GMT
Title: Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits
Authors: Jie Bian, Vincent Y. F. Tan,
Abstract summary: Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
Score: 55.938644481736446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Indexed Minimum Empirical Divergence (IMED) algorithm is a highly effective approach that offers a stronger theoretical guarantee of the asymptotic optimality compared to the Kullback--Leibler Upper Confidence Bound (KL-UCB) algorithm for the multi-armed bandit problem. Additionally, it has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. Despite its effectiveness, the generalization of this algorithm to contextual bandits with linear payoffs has remained elusive. In this paper, we present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms. We demonstrate that LinIMED provides a $\widetilde{O}(d\sqrt{T})$ upper regret bound where $d$ is the dimension of the context and $T$ is the time horizon. Furthermore, extensive empirical studies reveal that LinIMED and its variants outperform widely-used linear bandit algorithms such as LinUCB and Linear Thompson Sampling in some regimes.

Related papers

Minimum Empirical Divergence for Sub-Gaussian Linear Bandits [10.750348548547704]
LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities. Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.
arXiv Detail & Related papers (2024-10-31T21:54:44Z)
An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback [27.192028744078282]
We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods. The proposed framework is applied to diverse applications in submodular horizon.
arXiv Detail & Related papers (2023-01-30T23:18:06Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
An Empirical Study of Derivative-Free-Optimization Algorithms for Targeted Black-Box Attacks in Deep Neural Networks [8.368543987898732]
This paper considers four pre-existing state-of-the-art DFO-based algorithms along with the introduction of a new algorithm built on BOBYQA. We compare these algorithms in a variety of settings according to the fraction of images that they successfully misclassify. Experiments disclose how the likelihood of finding an adversarial example depends on both the algorithm used and the setting of the attack.
arXiv Detail & Related papers (2020-12-03T13:32:20Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
Bandit algorithms to emulate human decision making using probabilistic distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.