Replication-proof Bandit Mechanism Design
- URL: http://arxiv.org/abs/2312.16896v1
- Date: Thu, 28 Dec 2023 08:36:35 GMT
- Title: Replication-proof Bandit Mechanism Design
- Authors: Seyed Esmaeili, MohammadTaghi Hajiaghayi, Suho Shin
- Abstract summary: We study a problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms.
This extension presents significant challenges in analyzing equilibrium.
We provide a replication-proof algorithm for any problem instance.
- Score: 9.101603681930085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a problem of designing replication-proof bandit mechanisms when
agents strategically register or replicate their own arms to maximize their
payoff. We consider Bayesian agents who are unaware of ex-post realization of
their own arms' mean rewards, which is the first to study Bayesian extension of
Shin et al. (2022). This extension presents significant challenges in analyzing
equilibrium, in contrast to the fully-informed setting by Shin et al. (2022)
under which the problem simply reduces to a case where each agent only has a
single arm. With Bayesian agents, even in a single-agent setting, analyzing the
replication-proofness of an algorithm becomes complicated. Remarkably, we first
show that the algorithm proposed by Shin et al. (2022), defined H-UCB, is no
longer replication-proof for any exploration parameters. Then, we provide
sufficient and necessary conditions for an algorithm to be replication-proof in
the single-agent setting. These results centers around several analytical
results in comparing the expected regret of multiple bandit instances, which
might be of independent interest. We further prove that exploration-then-commit
(ETC) algorithm satisfies these properties, whereas UCB does not, which in fact
leads to the failure of being replication-proof. We expand this result to
multi-agent setting, and provide a replication-proof algorithm for any problem
instance. The proof mainly relies on the single-agent result, as well as some
structural properties of ETC and the novel introduction of a restarting round,
which largely simplifies the analysis while maintaining the regret unchanged
(up to polylogarithmic factor). We finalize our result by proving its sublinear
regret upper bound, which matches that of H-UCB.
Related papers
- Replicability is Asymptotically Free in Multi-armed Bandits [41.86005698892541]
We consider a replicable multi-armed bandit algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
We propose a principled approach to limiting the probability of nonreplication.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment.
Standard sampling-based algorithms can be used to search for solutions in the robots' joint space.
We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods.
We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Multi-armed Bandit Algorithm against Strategic Replication [5.235979896921492]
We consider a multi-armed bandit problem in which a set of arms is registered by each agent, and the agent receives reward when its arm is selected.
An agent might strategically submit more arms with replications, which can bring more reward by abusing the bandit algorithm's exploration-exploitation balance.
We propose a bandit algorithm which demotivates replications and also achieves a small cumulative regret.
arXiv Detail & Related papers (2021-10-23T07:38:44Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.