Related papers: Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

URL: http://arxiv.org/abs/2306.03832v2
Date: Sun, 17 Dec 2023 13:34:46 GMT
Title: Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning
Authors: Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic
Abstract summary: We study a sequential decision making problem between a principal and an agent with incomplete information on both sides. In this model, the principal and the agent interact in a environment, and each is privy to observations about the state not available to the other. We present an algorithmic-time algorithm to compute the principal's optimal policy up to an additive approximation.
Score: 27.50523143132825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a sequential decision making problem between a principal and an agent with incomplete information on both sides. In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The principal and the agent communicate their signals to each other, and select their actions independently based on this communication. Each player receives a payoff based on the state and their joint actions, and the environment moves to a new state. The interaction continues over a finite time horizon, and both players act to optimize their own total payoffs over the horizon. Our model encompasses as special cases stochastic games of incomplete information and POMDPs, as well as sequential Bayesian persuasion and mechanism design problems. We study both computation of optimal policies and learning in our setting. While the general problems are computationally intractable, we study algorithmic solutions under a conditional independence assumption on the underlying state-observation distributions. We present a polynomial-time algorithm to compute the principal's optimal policy up to an additive approximation. Additionally, we show an efficient learning algorithm in the case where the transition probabilities are not known beforehand. The algorithm guarantees sublinear regret for both players.

Related papers

Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. We design a provably sample efficient algorithm that tailors the UCB algorithm to our model. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z)
Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs. We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z)
Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit. In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine. We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private. Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z)
Cooperative Online Learning in Stochastic and Adversarial MDPs [50.62439652257712]
We study cooperative online learning in and adversarial Markov decision process (MDP) In each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.
arXiv Detail & Related papers (2022-01-31T12:32:11Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work. In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others. The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z)
BayGo: Joint Bayesian Learning and Information-Aware Graph Optimization [48.30183416069897]
BayGo is a novel fully decentralized joint Bayesian learning and graph optimization framework. We show that our framework achieves faster convergence and higher accuracy compared to fully-connected and star topology graphs.
arXiv Detail & Related papers (2020-11-09T11:16:55Z)
End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games. We propose two approaches, respectively based on explicit and implicit differentiation. The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)
Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive [57.47546090379434]
We study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions. We provide a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.
arXiv Detail & Related papers (2020-02-12T18:59:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.