Some performance considerations when using multi-armed bandit algorithms
in the presence of missing data
- URL: http://arxiv.org/abs/2205.03820v1
- Date: Sun, 8 May 2022 09:20:10 GMT
- Title: Some performance considerations when using multi-armed bandit algorithms
in the presence of missing data
- Authors: Xijin Chen, Kim May Lee, Sofia S. Villar, and David S. Robertson
- Abstract summary: When using multi-armed bandit algorithms, the potential impact of missing data is often overlooked.
We investigate the impact of missing data on several bandit algorithms via a simulation study assuming the rewards are missing at random.
- Score: 1.0499611180329804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When using multi-armed bandit algorithms, the potential impact of missing
data is often overlooked. In practice, the simplest approach is to ignore
missing outcomes and continue to sample following the bandit algorithm. We
investigate the impact of missing data on several bandit algorithms via a
simulation study assuming the rewards are missing at random. We focus on
two-armed bandit algorithms with binary outcomes in the context of patient
allocation for clinical trials with relatively small sample sizes. However, our
results can apply to other applications of bandit algorithms where missing data
is expected to occur. We assess the resulting operating characteristics,
including the expected reward (i.e., allocation results). Different
probabilities of missingness in both arms are considered. The key finding of
our work is that when using the simplest strategy of ignoring missing data, the
corresponding impact on the performance of multi-armed bandit strategies varies
according to their way of balancing the exploration-exploitation trade-off.
Algorithms that are geared towards exploration continue to assign samples to
the arm with more missing responses, and this arm is perceived as the superior
arm by the algorithm. By contrast, algorithms that are geared towards
exploitation would do the opposite and not assign samples to the arms with more
missing responses. Furthermore, for algorithms focusing more on exploration, we
illustrate that the problem of missing responses can be alleviated using a
simple mean imputation approach.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Extreme Bandits using Robust Statistics [12.6543086847761]
We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest.
We propose distribution free algorithms using robust statistics and characterize the statistical properties.
arXiv Detail & Related papers (2021-09-09T17:24:15Z) - Towards Optimal Algorithms for Multi-Player Bandits without Collision
Sensing Information [9.467920768170515]
We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information.
Our algorithm circumvents two problems shared by all state-of-the-art algorithms.
It does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward.
arXiv Detail & Related papers (2021-03-24T10:14:16Z) - Unsupervised Anomaly Detectors to Detect Intrusions in the Current
Threat Landscape [0.11470070927586014]
We show that Isolation Forests, One-Class Support Vector Machines and Self-Organizing Maps are more effective than their counterparts for intrusion detection.
We detail how attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms and Botnets are more difficult to detect.
arXiv Detail & Related papers (2020-12-21T14:06:58Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.