Markov Persuasion Processes: Learning to Persuade from Scratch
- URL: http://arxiv.org/abs/2402.03077v2
- Date: Wed, 6 Mar 2024 12:37:20 GMT
- Title: Markov Persuasion Processes: Learning to Persuade from Scratch
- Authors: Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni,
Alberto Marchesi, Nicola Gatti
- Abstract summary: In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions.
We design a learning algorithm for the sender, working with partial feedback.
We prove that its regret with respect to an optimal information-disclosure policy grows sublinearly in the number of episodes.
- Score: 37.92189925462977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Bayesian persuasion, an informed sender strategically discloses
information to a receiver so as to persuade them to undertake desirable
actions. Recently, a growing attention has been devoted to settings in which
sender and receivers interact sequentially. Recently, Markov persuasion
processes (MPPs) have been introduced to capture sequential scenarios where a
sender faces a stream of myopic receivers in a Markovian environment. The MPPs
studied so far in the literature suffer from issues that prevent them from
being fully operational in practice, e.g., they assume that the sender knows
receivers' rewards. We fix such issues by addressing MPPs where the sender has
no knowledge about the environment. We design a learning algorithm for the
sender, working with partial feedback. We prove that its regret with respect to
an optimal information-disclosure policy grows sublinearly in the number of
episodes, as it is the case for the loss in persuasiveness cumulated while
learning. Moreover, we provide a lower bound for our setting matching the
guarantees of our algorithm.
Related papers
- On the loss of context-awareness in general instruction fine-tuning [101.03941308894191]
Post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs can harm existing capabilities learned during pretraining.
We propose two methods to mitigate the loss of context awareness in instruct models: post-hoc attention steering on user prompts and conditional instruction fine-tuning with a context-dependency indicator.
arXiv Detail & Related papers (2024-11-05T00:16:01Z) - Randomized Confidence Bounds for Stochastic Partial Monitoring [8.649322557020666]
Partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback.
In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round.
We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds.
arXiv Detail & Related papers (2024-02-07T16:18:59Z) - Algorithmic Persuasion Through Simulation [51.23082754429737]
We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product.
The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities.
Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
arXiv Detail & Related papers (2023-11-29T23:01:33Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver.
At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem.
This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Learning to Persuade on the Fly: Robustness Against Ignorance [26.915262694667746]
We study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution.
The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information.
In contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly.
arXiv Detail & Related papers (2021-02-19T21:02:15Z) - Correcting Experience Replay for Multi-Agent Communication [18.12281605882891]
We consider the problem of learning to communicate using multi-agent reinforcement learning (MARL)
A common approach is to learn off-policy, using data sampled from a replay buffer.
We introduce a 'communication correction' which accounts for the non-stationarity of observed communication induced by MARL.
arXiv Detail & Related papers (2020-10-02T20:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.