Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes
- URL: http://arxiv.org/abs/2311.11789v2
- Date: Mon, 29 Apr 2024 18:05:39 GMT
- Title: Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes
- Authors: Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar,
- Abstract summary: We consider a cooperative multi-agent Markov decision process involving m agents.
In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents.
We propose approximate decentralized policy iteration algorithms using approximate linear programming with function approximation.
- Score: 5.842054972839244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents, incurring huge computational costs. Thus, recent works consider decentralized policy improvement, where each agent improves its decisions unilaterally, assuming that the decisions of the other agents are fixed. However, exact value functions are considered in the literature, which is computationally expensive for a large number of agents with high dimensional state-action space. Thus, we propose approximate decentralized policy iteration algorithms, using approximate linear programming with function approximation to compute the approximate value function for decentralized policy improvement. Further, we consider (both) cooperative multi-agent finite and infinite horizon discounted MDPs and propose suitable algorithms in each case. Moreover, we provide theoretical guarantees for our algorithms and also demonstrate their advantages over existing state-of-the-art algorithms in the literature.
Related papers
- Decentralised Q-Learning for Multi-Agent Markov Decision Processes with
a Satisfiability Criterion [0.0]
We propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP)
The goal is to lower the time average cost of each agent to below a pre-specified agent-specific bound.
arXiv Detail & Related papers (2023-11-21T13:56:44Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z) - Towards Global Optimality in Cooperative MARL with the Transformation
And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL)
In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods.
We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - Emergence of Theory of Mind Collaboration in Multiagent Systems [65.97255691640561]
We propose an adaptive training algorithm to develop effective collaboration between agents with ToM.
We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.
arXiv Detail & Related papers (2021-09-30T23:28:00Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Multiagent Value Iteration Algorithms in Dynamic Programming and
Reinforcement Learning [0.0]
We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions.
In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order.
arXiv Detail & Related papers (2020-05-04T16:34:24Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.