Replication-proof Bandit Mechanism Design
- URL: http://arxiv.org/abs/2312.16896v1
- Date: Thu, 28 Dec 2023 08:36:35 GMT
- Title: Replication-proof Bandit Mechanism Design
- Authors: Seyed Esmaeili, MohammadTaghi Hajiaghayi, Suho Shin
- Abstract summary: We study a problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms.
This extension presents significant challenges in analyzing equilibrium.
We provide a replication-proof algorithm for any problem instance.
- Score: 9.101603681930085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a problem of designing replication-proof bandit mechanisms when
agents strategically register or replicate their own arms to maximize their
payoff. We consider Bayesian agents who are unaware of ex-post realization of
their own arms' mean rewards, which is the first to study Bayesian extension of
Shin et al. (2022). This extension presents significant challenges in analyzing
equilibrium, in contrast to the fully-informed setting by Shin et al. (2022)
under which the problem simply reduces to a case where each agent only has a
single arm. With Bayesian agents, even in a single-agent setting, analyzing the
replication-proofness of an algorithm becomes complicated. Remarkably, we first
show that the algorithm proposed by Shin et al. (2022), defined H-UCB, is no
longer replication-proof for any exploration parameters. Then, we provide
sufficient and necessary conditions for an algorithm to be replication-proof in
the single-agent setting. These results centers around several analytical
results in comparing the expected regret of multiple bandit instances, which
might be of independent interest. We further prove that exploration-then-commit
(ETC) algorithm satisfies these properties, whereas UCB does not, which in fact
leads to the failure of being replication-proof. We expand this result to
multi-agent setting, and provide a replication-proof algorithm for any problem
instance. The proof mainly relies on the single-agent result, as well as some
structural properties of ETC and the novel introduction of a restarting round,
which largely simplifies the analysis while maintaining the regret unchanged
(up to polylogarithmic factor). We finalize our result by proving its sublinear
regret upper bound, which matches that of H-UCB.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Regression with Multi-Expert Deferral [30.389055604165222]
Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts.
We present a novel framework of regression with deferral, which involves deferring the prediction to multiple experts.
We introduce new surrogate loss functions for both scenarios and prove that they are supported by $H$-consistency bounds.
arXiv Detail & Related papers (2024-03-28T15:26:38Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment.
Standard sampling-based algorithms can be used to search for solutions in the robots' joint space.
We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods.
We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Multi-armed Bandit Algorithm against Strategic Replication [5.235979896921492]
We consider a multi-armed bandit problem in which a set of arms is registered by each agent, and the agent receives reward when its arm is selected.
An agent might strategically submit more arms with replications, which can bring more reward by abusing the bandit algorithm's exploration-exploitation balance.
We propose a bandit algorithm which demotivates replications and also achieves a small cumulative regret.
arXiv Detail & Related papers (2021-10-23T07:38:44Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.