Sequential Principal-Agent Problems with Communication: Efficient
Computation and Learning
- URL: http://arxiv.org/abs/2306.03832v2
- Date: Sun, 17 Dec 2023 13:34:46 GMT
- Title: Sequential Principal-Agent Problems with Communication: Efficient
Computation and Learning
- Authors: Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic
- Abstract summary: We study a sequential decision making problem between a principal and an agent with incomplete information on both sides.
In this model, the principal and the agent interact in a environment, and each is privy to observations about the state not available to the other.
We present an algorithmic-time algorithm to compute the principal's optimal policy up to an additive approximation.
- Score: 27.50523143132825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a sequential decision making problem between a principal and an
agent with incomplete information on both sides. In this model, the principal
and the agent interact in a stochastic environment, and each is privy to
observations about the state not available to the other. The principal has the
power of commitment, both to elicit information from the agent and to provide
signals about her own information. The principal and the agent communicate
their signals to each other, and select their actions independently based on
this communication. Each player receives a payoff based on the state and their
joint actions, and the environment moves to a new state. The interaction
continues over a finite time horizon, and both players act to optimize their
own total payoffs over the horizon. Our model encompasses as special cases
stochastic games of incomplete information and POMDPs, as well as sequential
Bayesian persuasion and mechanism design problems. We study both computation of
optimal policies and learning in our setting. While the general problems are
computationally intractable, we study algorithmic solutions under a conditional
independence assumption on the underlying state-observation distributions. We
present a polynomial-time algorithm to compute the principal's optimal policy
up to an additive approximation. Additionally, we show an efficient learning
algorithm in the case where the transition probabilities are not known
beforehand. The algorithm guarantees sublinear regret for both players.
Related papers
- Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts.
Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs.
We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z) - Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit.
In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine.
We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private.
Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z) - Cooperative Online Learning in Stochastic and Adversarial MDPs [50.62439652257712]
We study cooperative online learning in and adversarial Markov decision process (MDP)
In each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret.
We are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.
arXiv Detail & Related papers (2022-01-31T12:32:11Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - BayGo: Joint Bayesian Learning and Information-Aware Graph Optimization [48.30183416069897]
BayGo is a novel fully decentralized joint Bayesian learning and graph optimization framework.
We show that our framework achieves faster convergence and higher accuracy compared to fully-connected and star topology graphs.
arXiv Detail & Related papers (2020-11-09T11:16:55Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z) - Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive [57.47546090379434]
We study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions.
We provide a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.
arXiv Detail & Related papers (2020-02-12T18:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.