Related papers: Identifying Copeland Winners in Dueling Bandits with Indifferences

Identifying Copeland Winners in Dueling Bandits with Indifferences

URL: http://arxiv.org/abs/2310.00750v1
Date: Sun, 1 Oct 2023 17:59:27 GMT
Title: Identifying Copeland Winners in Dueling Bandits with Indifferences
Authors: Viktor Bengs, Bj\"orn Haddenhorst, Eyke H\"ullermeier
Abstract summary: We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. We propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance.
Score: 12.96903983663382
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

Related papers

The Batch Complexity of Bandit Pure Exploration [10.036727981085223]
In a pure exploration problem in multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task.
arXiv Detail & Related papers (2025-02-03T15:03:45Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting. We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Batched Dueling Bandits [13.69077222007053]
We study the batched $K$-armed dueling bandit problem under two standard settings. We obtain algorithms with a smooth trade-off between the number of batches and regret.
arXiv Detail & Related papers (2022-02-22T04:02:36Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits. Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z)
Statistical Consequences of Dueling Bandits [0.0]
Multi-Armed-Bandit frameworks have often been used to assess educational interventions. Recent work has shown that it is more beneficial for a student to provide qualitative feedback through preference elicitation. We compare traditional uniform sampling to a dueling bandit algorithm and find that dueling bandit algorithms perform well at cumulative regret minimisation, but lead to inflated Type-I error rates and reduced power under certain circumstances.
arXiv Detail & Related papers (2021-10-16T23:48:43Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
Thompson Sampling for Bandits with Clustered Arms [7.237493755167875]
We show, theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost. Our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.
arXiv Detail & Related papers (2021-09-06T08:58:01Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator. We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors. Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.