Multi-Receiver Online Bayesian Persuasion
- URL: http://arxiv.org/abs/2106.06480v1
- Date: Fri, 11 Jun 2021 16:05:31 GMT
- Title: Multi-Receiver Online Bayesian Persuasion
- Authors: Matteo Castiglioni, Alberto Marchesi, Andrea Celli, Nicola Gatti
- Abstract summary: We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type.
We focus on the case with no externalities and binary actions, as customary in offline models.
We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
- Score: 51.94795123103707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian persuasion studies how an informed sender should partially disclose
information to influence the behavior of a self-interested receiver. Classical
models make the stringent assumption that the sender knows the receiver's
utility. This can be relaxed by considering an online learning framework in
which the sender repeatedly faces a receiver of an unknown, adversarially
selected type. We study, for the first time, an online Bayesian persuasion
setting with multiple receivers. We focus on the case with no externalities and
binary actions, as customary in offline models. Our goal is to design no-regret
algorithms for the sender with polynomial per-iteration running time. First, we
prove a negative result: for any $0 < \alpha \leq 1$, there is no
polynomial-time no-$\alpha$-regret algorithm when the sender's utility function
is supermodular or anonymous. Then, we focus on the case of submodular sender's
utility functions and we show that, in this case, it is possible to design a
polynomial-time no-$(1 - \frac{1}{e})$-regret algorithm. To do so, we introduce
a general online gradient descent scheme to handle online learning problems
with a finite number of possible loss functions. This requires the existence of
an approximate projection oracle. We show that, in our setting, there exists
one such projection oracle which can be implemented in polynomial time.
Related papers
- Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Learning How to Strategically Disclose Information [6.267574471145217]
We consider an online version of information design where a sender interacts with a receiver of an unknown type.
We show that $mathcalO(sqrtT)$ regret is achievable with full information feedback.
We also propose a novel parametrization that allows the sender to achieve $mathcalO(sqrtT)$ regret for a general convex utility function.
arXiv Detail & Related papers (2024-03-13T17:44:16Z) - Algorithmic Persuasion Through Simulation [51.23082754429737]
We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product.
The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities.
Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
arXiv Detail & Related papers (2023-11-29T23:01:33Z) - Adversarial Bandits with Multi-User Delayed Feedback: Theory and
Application [17.64363983613468]
We formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3.
In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution.
arXiv Detail & Related papers (2023-10-17T12:08:15Z) - Selective Sampling and Imitation Learning via Online Regression [17.73844193143454]
We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback.
We provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback.
arXiv Detail & Related papers (2023-07-11T03:32:20Z) - Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver.
At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem.
This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z) - Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance.
Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action.
We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.