Multiagent Value Iteration Algorithms in Dynamic Programming and
Reinforcement Learning
- URL: http://arxiv.org/abs/2005.01627v1
- Date: Mon, 4 May 2020 16:34:24 GMT
- Title: Multiagent Value Iteration Algorithms in Dynamic Programming and
Reinforcement Learning
- Authors: Dimitri Bertsekas
- Abstract summary: We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions.
In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider infinite horizon dynamic programming problems, where the control
at each stage consists of several distinct decisions, each one made by one of
several agents. In an earlier work we introduced a policy iteration algorithm,
where the policy improvement is done one-agent-at-a-time in a given order, with
knowledge of the choices of the preceding agents in the order. As a result, the
amount of computation for each policy improvement grows linearly with the
number of agents, as opposed to exponentially for the standard
all-agents-at-once method. For the case of a finite-state discounted problem,
we showed convergence to an agent-by-agent optimal policy. In this paper, this
result is extended to value iteration and optimistic versions of policy
iteration, as well as to more general DP problems where the Bellman operator is
a contraction mapping, such as stochastic shortest path problems with all
policies being proper.
Related papers
- Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes [5.842054972839244]
We consider a cooperative multi-agent Markov decision process involving m agents.
In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents.
We propose approximate decentralized policy iteration algorithms using approximate linear programming with function approximation.
arXiv Detail & Related papers (2023-11-20T14:14:13Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Verifiable Planning in Expected Reward Multichain MDPs [20.456052208569115]
We explore the steady-state planning problem of deriving a decision-making policy for an agent.
We prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
arXiv Detail & Related papers (2020-12-03T18:54:24Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.