Learning to Persuade on the Fly: Robustness Against Ignorance
- URL: http://arxiv.org/abs/2102.10156v2
- Date: Fri, 3 May 2024 05:08:29 GMT
- Title: Learning to Persuade on the Fly: Robustness Against Ignorance
- Authors: You Zu, Krishnamurthy Iyer, Haifeng Xu,
- Abstract summary: We study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution.
The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information.
In contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly.
- Score: 26.915262694667746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by information sharing in online platforms, we study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution, and shares state information with the receivers who each choose an action. The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information. However, in contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly. We study the sender's learning problem of making persuasive action recommendations to achieve low regret against the optimal persuasion mechanism with the knowledge of the distribution. To do this, we first propose and motivate a persuasiveness criterion for the unknown distribution setting that centers robustness as a requirement in the face of uncertainty. Our main result is an algorithm that, with high probability, is robustly-persuasive and achieves $O(\sqrt{T\log T})$ regret, where $T$ is the horizon length. Intuitively, at each time our algorithm maintains a set of candidate distributions, and chooses a signaling mechanism that is simultaneously persuasive for all of them. Core to our proof is a tight analysis about the cost of robust persuasion, which may be of independent interest. We further prove that this regret order is optimal (up to logarithmic terms) by showing that no algorithm can achieve regret better than $\Omega(\sqrt{T})$.
Related papers
- Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Learning How to Strategically Disclose Information [6.267574471145217]
We consider an online version of information design where a sender interacts with a receiver of an unknown type.
We show that $mathcalO(sqrtT)$ regret is achievable with full information feedback.
We also propose a novel parametrization that allows the sender to achieve $mathcalO(sqrtT)$ regret for a general convex utility function.
arXiv Detail & Related papers (2024-03-13T17:44:16Z) - Markov Persuasion Processes: Learning to Persuade from Scratch [37.92189925462977]
In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions.
We design a learning algorithm for the sender, working with partial feedback.
We prove that its regret with respect to an optimal information-disclosure policy grows sublinearly in the number of episodes.
arXiv Detail & Related papers (2024-02-05T15:09:41Z) - Algorithmic Persuasion Through Simulation [51.23082754429737]
We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product.
The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities.
Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
arXiv Detail & Related papers (2023-11-29T23:01:33Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver.
At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem.
This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z) - Peer Selection with Noisy Assessments [43.307040330622186]
We extend PeerNomination, the most accurate peer reviewing algorithm to date, into WeightedPeerNomination.
We show analytically that a weighting scheme can improve the overall accuracy of the selection significantly.
arXiv Detail & Related papers (2021-07-21T14:47:11Z) - VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit
Feedback [104.06766271716774]
We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values.
We first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.
Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets.
arXiv Detail & Related papers (2020-04-19T18:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.