Sequential Information Design: Learning to Persuade in the Dark
- URL: http://arxiv.org/abs/2209.03927v1
- Date: Thu, 8 Sep 2022 17:08:12 GMT
- Title: Sequential Information Design: Learning to Persuade in the Dark
- Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola
Gatti, Francesco Trovo
- Abstract summary: We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver.
At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem.
This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
- Score: 49.437419242582884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a repeated information design problem faced by an informed sender
who tries to influence the behavior of a self-interested receiver. We consider
settings where the receiver faces a sequential decision making (SDM) problem.
At each round, the sender observes the realizations of random events in the SDM
problem. This begets the challenge of how to incrementally disclose such
information to the receiver to persuade them to follow (desirable) action
recommendations. We study the case in which the sender does not know random
events probabilities, and, thus, they have to gradually learn them while
persuading the receiver. We start by providing a non-trivial polytopal
approximation of the set of sender's persuasive information structures. This is
crucial to design efficient learning algorithms. Next, we prove a negative
result: no learning algorithm can be persuasive. Thus, we relax persuasiveness
requirements by focusing on algorithms that guarantee that the receiver's
regret in following recommendations grows sub-linearly. In the full-feedback
setting -- where the sender observes all random events realizations -- , we
provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and
the receiver. Instead, in the bandit-feedback setting -- where the sender only
observes the realizations of random events actually occurring in the SDM
problem -- , we design an algorithm that, given an $\alpha \in [1/2, 1]$ as
input, ensures $\tilde{O}({T^\alpha})$ and $\tilde{O}( T^{\max \{ \alpha,
1-\frac{\alpha}{2} \} })$ regrets, for the sender and the receiver
respectively. This result is complemented by a lower bound showing that such a
regrets trade-off is essentially tight.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Learning How to Strategically Disclose Information [6.267574471145217]
We consider an online version of information design where a sender interacts with a receiver of an unknown type.
We show that $mathcalO(sqrtT)$ regret is achievable with full information feedback.
We also propose a novel parametrization that allows the sender to achieve $mathcalO(sqrtT)$ regret for a general convex utility function.
arXiv Detail & Related papers (2024-03-13T17:44:16Z) - Markov Persuasion Processes: Learning to Persuade from Scratch [37.92189925462977]
In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions.
We design a learning algorithm for the sender, working with partial feedback.
We prove that its regret with respect to an optimal information-disclosure policy grows sublinearly in the number of episodes.
arXiv Detail & Related papers (2024-02-05T15:09:41Z) - Algorithmic Persuasion Through Simulation [51.23082754429737]
We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product.
The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities.
Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
arXiv Detail & Related papers (2023-11-29T23:01:33Z) - Multi-Receiver Online Bayesian Persuasion [51.94795123103707]
We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type.
We focus on the case with no externalities and binary actions, as customary in offline models.
We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
arXiv Detail & Related papers (2021-06-11T16:05:31Z) - Learning to Persuade on the Fly: Robustness Against Ignorance [26.915262694667746]
We study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution.
The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information.
In contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly.
arXiv Detail & Related papers (2021-02-19T21:02:15Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff
in Regret [115.85354306623368]
We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels.
We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ)
We prove that RSVI attains an $tildeObig(lambda(|beta| H2) cdot sqrtH3 S2AT big)$ regret, while RSQ attains an $tildeObig(lambda
arXiv Detail & Related papers (2020-06-22T19:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.