Related papers: Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2109.11692v1
Date: Thu, 23 Sep 2021 23:38:15 GMT
Title: Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning
Authors: Carlo Alfano, Patrick Rebeschini
Abstract summary: We propose a scalable algorithm for cooperative multi-agent reinforcement learning. We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
Score: 22.310861786709538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cooperative multi-agent reinforcement learning is a decentralized paradigm in sequential decision making where agents distributed over a network iteratively collaborate with neighbors to maximize global (network-wide) notions of rewards. Exact computations typically involve a complexity that scales exponentially with the number of agents. To address this curse of dimensionality, we design a scalable algorithm based on the Natural Policy Gradient framework that uses local information and only requires agents to communicate with neighbors within a certain range. Under standard assumptions on the spatial decay of correlations for the transition dynamics of the underlying Markov process and the localized learning policy, we show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity, incurring a localization error that does not depend on the number of agents and converges to zero exponentially fast as a function of the range of communication.

Related papers

Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network Design [3.5527561584422465]
We present AlterNAting COordination and Network-Design Algorithm (Anaconda) Anaconda is a scalable algorithm that also enjoys near-optimality guarantees. We demonstrate in simulated scenarios of area monitoring and compare it with a state-of-the-art algorithm.
arXiv Detail & Related papers (2024-09-02T18:11:33Z)
Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation [59.01527054553122]
Decentralised agents can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system. We introduce function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method. We additionally provide new algorithms that allow agents to estimate the global empirical distribution based on a local neighbourhood.
arXiv Detail & Related papers (2024-08-21T13:32:46Z)
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z)
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z)
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph [18.04270684579841]
Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks assume undirected coordination graphs and communication graphs. We propose a distributed RL algorithm where the local policy evaluations are based on local value functions.
arXiv Detail & Related papers (2022-01-10T04:14:46Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging. We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z)
Multi-Agent Reinforcement Learning in Stochastic Networked Systems [30.78949372661673]
We study multi-agent reinforcement learning (MARL) in a network of agents. The objective is to find localized policies that maximize the (discounted) global reward.
arXiv Detail & Related papers (2020-06-11T16:08:16Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods [2.330509865741341]
We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states and actions of the other agents.
arXiv Detail & Related papers (2019-10-09T23:19:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.