Scalable and Sample Efficient Distributed Policy Gradient Algorithms in
Multi-Agent Networked Systems
- URL: http://arxiv.org/abs/2212.06357v2
- Date: Sun, 14 May 2023 16:08:31 GMT
- Title: Scalable and Sample Efficient Distributed Policy Gradient Algorithms in
Multi-Agent Networked Systems
- Authors: Xin Liu, Honghao Wei, Lei Ying
- Abstract summary: We name it REC-MARL standing for REward-Coupled Multi-Agent Reinforcement Learning.
REC-MARL has a range of important applications such as real-time access control and distributed power control in wireless networks.
- Score: 12.327745531583277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies a class of multi-agent reinforcement learning (MARL)
problems where the reward that an agent receives depends on the states of other
agents, but the next state only depends on the agent's own current state and
action. We name it REC-MARL standing for REward-Coupled Multi-Agent
Reinforcement Learning. REC-MARL has a range of important applications such as
real-time access control and distributed power control in wireless networks.
This paper presents a distributed policy gradient algorithm for REC-MARL. The
proposed algorithm is distributed in two aspects: (i) the learned policy is a
distributed policy that maps a local state of an agent to its local action and
(ii) the learning/training is distributed, during which each agent updates its
policy based on its own and neighbors' information. The learned algorithm
achieves a stationary policy and its iterative complexity bounds depend on the
dimension of local states and actions. The experimental results of our
algorithm for the real-time access control and power control in wireless
networks show that our policy significantly outperforms the state-of-the-art
algorithms and well-known benchmarks.
Related papers
- PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods [0.0]
We introduce PG-Rainbow, a novel algorithm that incorporates a distributional reinforcement learning framework with a policy gradient algorithm.
We show empirical results that through the integration of reward distribution information into the policy network, the policy agent acquires enhanced capabilities.
arXiv Detail & Related papers (2024-07-18T04:18:52Z) - Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system.
We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - Regularize! Don't Mix: Multi-Agent Reinforcement Learning without
Explicit Centralized Structures [8.883885464358737]
We propose using regularization for Multi-Agent Reinforcement Learning rather than learning explicit cooperative structures called em Multi-Agent Regularized Q-learning (MARQ)
Our algorithm is evaluated on several benchmark multi-agent environments and we show that MARQ consistently outperforms several baselines and state-of-the-art algorithms.
arXiv Detail & Related papers (2021-09-19T00:58:38Z) - Distributed Q-Learning with State Tracking for Multi-agent Networked
Control [61.63442612938345]
This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network.
We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents.
arXiv Detail & Related papers (2020-12-22T22:03:49Z) - Multi-Agent Trust Region Policy Optimization [34.91180300856614]
We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases.
We propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO)
arXiv Detail & Related papers (2020-10-15T17:49:47Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.