Related papers: Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

URL: http://arxiv.org/abs/2205.03820v1
Date: Sun, 8 May 2022 09:20:10 GMT
Title: Some performance considerations when using multi-armed bandit algorithms in the presence of missing data
Authors: Xijin Chen, Kim May Lee, Sofia S. Villar, and David S. Robertson
Abstract summary: When using multi-armed bandit algorithms, the potential impact of missing data is often overlooked. We investigate the impact of missing data on several bandit algorithms via a simulation study assuming the rewards are missing at random.
Score: 1.0499611180329804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When using multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, the simplest approach is to ignore missing outcomes and continue to sample following the bandit algorithm. We investigate the impact of missing data on several bandit algorithms via a simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results can apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward (i.e., allocation results). Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the corresponding impact on the performance of multi-armed bandit strategies varies according to their way of balancing the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses, and this arm is perceived as the superior arm by the algorithm. By contrast, algorithms that are geared towards exploitation would do the opposite and not assign samples to the arms with more missing responses. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.

Related papers

The influence of missing data mechanisms and simple missing data handling techniques on fairness [0.0]
We study how missing values and the handling thereof can impact the fairness of an algorithm. The starting point of the study is the mechanism of missingness, leading into how the missing data are processed. The results show that under certain scenarios the impact on fairness can be pronounced when the missingness mechanism is missing at random.
arXiv Detail & Related papers (2025-03-10T13:32:25Z)
The Batch Complexity of Bandit Pure Exploration [10.036727981085223]
In a pure exploration problem in multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task.
arXiv Detail & Related papers (2025-02-03T15:03:45Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [55.2480439325792]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) We introduce an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Extreme Bandits using Robust Statistics [12.6543086847761]
We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest. We propose distribution free algorithms using robust statistics and characterize the statistical properties.
arXiv Detail & Related papers (2021-09-09T17:24:15Z)
Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information [9.467920768170515]
We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms. It does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward.
arXiv Detail & Related papers (2021-03-24T10:14:16Z)
Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape [0.11470070927586014]
We show that Isolation Forests, One-Class Support Vector Machines and Self-Organizing Maps are more effective than their counterparts for intrusion detection. We detail how attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms and Botnets are more difficult to detect.
arXiv Detail & Related papers (2020-12-21T14:06:58Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.