Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2109.11692v1
- Date: Thu, 23 Sep 2021 23:38:15 GMT
- Title: Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning
- Authors: Carlo Alfano, Patrick Rebeschini
- Abstract summary: We propose a scalable algorithm for cooperative multi-agent reinforcement learning.
We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
- Score: 22.310861786709538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cooperative multi-agent reinforcement learning is a decentralized paradigm in
sequential decision making where agents distributed over a network iteratively
collaborate with neighbors to maximize global (network-wide) notions of
rewards. Exact computations typically involve a complexity that scales
exponentially with the number of agents. To address this curse of
dimensionality, we design a scalable algorithm based on the Natural Policy
Gradient framework that uses local information and only requires agents to
communicate with neighbors within a certain range. Under standard assumptions
on the spatial decay of correlations for the transition dynamics of the
underlying Markov process and the localized learning policy, we show that our
algorithm converges to the globally optimal policy with a dimension-free
statistical and computational complexity, incurring a localization error that
does not depend on the number of agents and converges to zero exponentially
fast as a function of the range of communication.
Related papers
- Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network Design [3.5527561584422465]
We present AlterNAting COordination and Network-Design Algorithm (Anaconda)
Anaconda is a scalable algorithm that also enjoys near-optimality guarantees.
We demonstrate in simulated scenarios of area monitoring and compare it with a state-of-the-art algorithm.
arXiv Detail & Related papers (2024-09-02T18:11:33Z) - Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation [59.01527054553122]
Decentralised agents can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system.
We introduce function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method.
We additionally provide new algorithms that allow agents to estimate the global empirical distribution based on a local neighbourhood.
arXiv Detail & Related papers (2024-08-21T13:32:46Z) - Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Distributed Cooperative Multi-Agent Reinforcement Learning with Directed
Coordination Graph [18.04270684579841]
Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks assume undirected coordination graphs and communication graphs.
We propose a distributed RL algorithm where the local policy evaluations are based on local value functions.
arXiv Detail & Related papers (2022-01-10T04:14:46Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Convergence Rates of Average-Reward Multi-agent Reinforcement Learning
via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability.
We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging.
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z) - Multi-Agent Reinforcement Learning in Stochastic Networked Systems [30.78949372661673]
We study multi-agent reinforcement learning (MARL) in a network of agents.
The objective is to find localized policies that maximize the (discounted) global reward.
arXiv Detail & Related papers (2020-06-11T16:08:16Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.