Persuading a Behavioral Agent: Approximately Best Responding and
Learning
- URL: http://arxiv.org/abs/2302.03719v2
- Date: Thu, 22 Feb 2024 05:43:12 GMT
- Title: Persuading a Behavioral Agent: Approximately Best Responding and
Learning
- Authors: Yiling Chen, Tao Lin
- Abstract summary: We study a relaxation of the Bayesian persuasion model where the receiver can approximately best respond to the sender's signaling scheme.
We show that, under natural assumptions, the sender can find a signaling scheme that guarantees itself an expected utility almost as good as its optimal utility.
- Score: 7.378697321839991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classic Bayesian persuasion model assumes a Bayesian and best-responding
receiver. We study a relaxation of the Bayesian persuasion model where the
receiver can approximately best respond to the sender's signaling scheme. We
show that, under natural assumptions, (1) the sender can find a signaling
scheme that guarantees itself an expected utility almost as good as its optimal
utility in the classic model, no matter what approximately best-responding
strategy the receiver uses; (2) on the other hand, there is no signaling scheme
that gives the sender much more utility than its optimal utility in the classic
model, even if the receiver uses the approximately best-responding strategy
that is best for the sender. Together, (1) and (2) imply that the approximately
best-responding behavior of the receiver does not affect the sender's maximal
achievable utility a lot in the Bayesian persuasion problem. The proofs of both
results rely on the idea of robustification of a Bayesian persuasion scheme:
given a pair of the sender's signaling scheme and the receiver's strategy, we
can construct another signaling scheme such that the receiver prefers to use
that strategy in the new scheme more than in the original scheme, and the two
schemes give the sender similar utilities. As an application of our main result
(1), we show that, in a repeated Bayesian persuasion model where the receiver
learns to respond to the sender by some algorithms, the sender can do almost as
well as in the classic model. Interestingly, unlike (2), with a learning
receiver the sender can sometimes do much better than in the classic model.
Related papers
- Efficient Model-agnostic Alignment via Bayesian Persuasion [13.42367964190663]
We introduce a model-agnostic and lightweight Bayesian Persuasion Alignment framework.
In the persuasion process, the small model (Advisor) observes the information item (i.e., state) and persuades large models (Receiver) to elicit improved responses.
We show that GPT-2 can significantly improve the performance of various models, achieving an average enhancement of 16.1% in mathematical reasoning ability and 13.7% in code generation.
arXiv Detail & Related papers (2024-05-29T02:57:07Z) - Algorithmic Persuasion Through Simulation [51.23082754429737]
We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product.
The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities.
Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
arXiv Detail & Related papers (2023-11-29T23:01:33Z) - Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process.
We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver.
At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem.
This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z) - Multi-Receiver Online Bayesian Persuasion [51.94795123103707]
We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type.
We focus on the case with no externalities and binary actions, as customary in offline models.
We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
arXiv Detail & Related papers (2021-06-11T16:05:31Z) - Learning to Persuade on the Fly: Robustness Against Ignorance [26.915262694667746]
We study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution.
The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information.
In contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly.
arXiv Detail & Related papers (2021-02-19T21:02:15Z) - BLOB : A Probabilistic Model for Recommendation that Combines Organic
and Bandit Signals [12.83118601099289]
We propose a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality.
We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms.
arXiv Detail & Related papers (2020-08-28T06:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.