Related papers: Persuading a Behavioral Agent: Approximately Best Responding and Learning

Persuading a Behavioral Agent: Approximately Best Responding and Learning

URL: http://arxiv.org/abs/2302.03719v2
Date: Thu, 22 Feb 2024 05:43:12 GMT
Title: Persuading a Behavioral Agent: Approximately Best Responding and Learning
Authors: Yiling Chen, Tao Lin
Abstract summary: We study a relaxation of the Bayesian persuasion model where the receiver can approximately best respond to the sender's signaling scheme. We show that, under natural assumptions, the sender can find a signaling scheme that guarantees itself an expected utility almost as good as its optimal utility.
Score: 7.378697321839991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The classic Bayesian persuasion model assumes a Bayesian and best-responding receiver. We study a relaxation of the Bayesian persuasion model where the receiver can approximately best respond to the sender's signaling scheme. We show that, under natural assumptions, (1) the sender can find a signaling scheme that guarantees itself an expected utility almost as good as its optimal utility in the classic model, no matter what approximately best-responding strategy the receiver uses; (2) on the other hand, there is no signaling scheme that gives the sender much more utility than its optimal utility in the classic model, even if the receiver uses the approximately best-responding strategy that is best for the sender. Together, (1) and (2) imply that the approximately best-responding behavior of the receiver does not affect the sender's maximal achievable utility a lot in the Bayesian persuasion problem. The proofs of both results rely on the idea of robustification of a Bayesian persuasion scheme: given a pair of the sender's signaling scheme and the receiver's strategy, we can construct another signaling scheme such that the receiver prefers to use that strategy in the new scheme more than in the original scheme, and the two schemes give the sender similar utilities. As an application of our main result (1), we show that, in a repeated Bayesian persuasion model where the receiver learns to respond to the sender by some algorithms, the sender can do almost as well as in the classic model. Interestingly, unlike (2), with a learning receiver the sender can sometimes do much better than in the classic model.

Related papers

Intra-Trajectory Consistency for Reward Modeling [67.84522106537274]
We develop an intra-trajectory consistency regularization to enforce that adjacent processes with higher next-token generation probability maintain more consistent rewards.<n>We show that the reward model trained with the proposed regularization induces better DPO-aligned policies and achieves better best-of-N (BON) inference-time verification results.
arXiv Detail & Related papers (2025-06-10T12:59:14Z)
Information Bargaining: Bilateral Commitment in Bayesian Persuasion [60.3761154043329]
We introduce a unified framework and a well-structured solution concept for long-term persuasion.<n>This perspective makes explicit the common knowledge of the game structure and grants the receiver comparable commitment capabilities.<n>The framework is validated through a two-stage validation-and-inference paradigm.
arXiv Detail & Related papers (2025-06-06T08:42:34Z)
Off-Policy Evaluation for Sequential Persuasion Process with Unobserved Confounding [2.7282382992043885]
Real-world scenarios often involve hidden variables that impact the receiver's belief formation and decision-making. We conceptualize this as a sequential decision-making problem, where the sender and receiver interact over multiple rounds. By reformulating this scenario as a Partially Observable Markov Decision Process (POMDP), we capture the sender's incomplete information regarding both the dynamics of the receiver's beliefs and the unobserved confounder.
arXiv Detail & Related papers (2025-04-01T21:50:32Z)
Efficient Model-agnostic Alignment via Bayesian Persuasion [13.42367964190663]
We introduce a model-agnostic and lightweight Bayesian Persuasion Alignment framework. In the persuasion process, the small model (Advisor) observes the information item (i.e., state) and persuades large models (Receiver) to elicit improved responses. We show that GPT-2 can significantly improve the performance of various models, achieving an average enhancement of 16.1% in mathematical reasoning ability and 13.7% in code generation.
arXiv Detail & Related papers (2024-05-29T02:57:07Z)
Algorithmic Persuasion Through Simulation [51.23082754429737]
We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product. The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities. Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
arXiv Detail & Related papers (2023-11-29T23:01:33Z)
Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process. We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT. We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z)
Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z)
Multi-Receiver Online Bayesian Persuasion [51.94795123103707]
We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We focus on the case with no externalities and binary actions, as customary in offline models. We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
arXiv Detail & Related papers (2021-06-11T16:05:31Z)
Learning to Persuade on the Fly: Robustness Against Ignorance [26.915262694667746]
We study repeated persuasion between a sender and a stream of receivers where at each time, the sender observes a payoff-relevant state drawn independently and identically from an unknown distribution. The sender seeks to persuade the receivers into taking actions aligned with the sender's preference by selectively sharing state information. In contrast to the standard models, neither the sender nor the receivers know the distribution, and the sender has to persuade while learning the distribution on the fly.
arXiv Detail & Related papers (2021-02-19T21:02:15Z)
BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals [12.83118601099289]
We propose a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality. We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms.
arXiv Detail & Related papers (2020-08-28T06:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.