Extreme Bandits using Robust Statistics
- URL: http://arxiv.org/abs/2109.04433v1
- Date: Thu, 9 Sep 2021 17:24:15 GMT
- Title: Extreme Bandits using Robust Statistics
- Authors: Sujay Bhatt, Ping Li, Gennady Samorodnitsky
- Abstract summary: We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest.
We propose distribution free algorithms using robust statistics and characterize the statistical properties.
- Score: 12.6543086847761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a multi-armed bandit problem motivated by situations where only
the extreme values, as opposed to expected values in the classical bandit
setting, are of interest. We propose distribution free algorithms using robust
statistics and characterize the statistical properties. We show that the
provided algorithms achieve vanishing extremal regret under weaker conditions
than existing algorithms. Performance of the algorithms is demonstrated for the
finite-sample setting using numerical experiments. The results show superior
performance of the proposed algorithms compared to the well known algorithms.
Related papers
- A Batch Sequential Halving Algorithm without Performance Degradation [0.8283940114367677]
We show that a simple Sequential batch algorithm does not degrade the performance under practical conditions.
We empirically validate our claim through experiments, demonstrating the robust nature of the algorithm in fixed-size batch settings.
arXiv Detail & Related papers (2024-06-01T12:41:50Z) - Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - Some performance considerations when using multi-armed bandit algorithms
in the presence of missing data [1.0499611180329804]
When using multi-armed bandit algorithms, the potential impact of missing data is often overlooked.
We investigate the impact of missing data on several bandit algorithms via a simulation study assuming the rewards are missing at random.
arXiv Detail & Related papers (2022-05-08T09:20:10Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Bandit algorithms: Letting go of logarithmic regret for statistical
robustness [0.0]
We study regret in a multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness.
We show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret.
arXiv Detail & Related papers (2020-06-22T07:18:47Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.