Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2303.00912v1
- Date: Thu, 2 Mar 2023 02:17:14 GMT
- Title: Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep
Reinforcement Learning
- Authors: Woojun Kim, Youngchul Sung
- Abstract summary: We propose a simple method that adopts structured pruning for a deep neural network to increase the representational capacity of the joint policy without introducing additional parameters.
We evaluate the proposed method on several benchmark tasks, and numerical results show that the proposed method significantly outperforms other parameter-sharing methods.
- Score: 20.35644044703191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handling the problem of scalability is one of the essential issues for
multi-agent reinforcement learning (MARL) algorithms to be applied to
real-world problems typically involving massively many agents. For this,
parameter sharing across multiple agents has widely been used since it reduces
the training time by decreasing the number of parameters and increasing the
sample efficiency. However, using the same parameters across agents limits the
representational capacity of the joint policy and consequently, the performance
can be degraded in multi-agent tasks that require different behaviors for
different agents. In this paper, we propose a simple method that adopts
structured pruning for a deep neural network to increase the representational
capacity of the joint policy without introducing additional parameters. We
evaluate the proposed method on several benchmark tasks, and numerical results
show that the proposed method significantly outperforms other parameter-sharing
methods.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Efficient Pareto Manifold Learning with Low-Rank Structure [31.082432589391953]
Multi-task learning is inherently a multi-objective optimization problem.
We propose a novel approach that integrates a main network with several low-rank matrices.
It significantly reduces the number of parameters and facilitates the extraction of shared features.
arXiv Detail & Related papers (2024-07-30T11:09:27Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Adaptive parameter sharing for multi-agent reinforcement learning [16.861543418593044]
We propose a novel parameter sharing method inspired by research pertaining to the brain in biology.
It maps each type of agent to different regions within a shared network based on their identity, resulting in distinctworks.
Our method can increase the diversity of strategies among different agents without additional training parameters.
arXiv Detail & Related papers (2023-12-14T15:00:32Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - Effective Multi-Agent Deep Reinforcement Learning Control with Relative
Entropy Regularization [6.441951360534903]
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents.
It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure.
arXiv Detail & Related papers (2023-09-26T07:38:19Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling [13.915157044948364]
One of the preeminent obstacles to scaling multi-agent reinforcement learning is assigning credit to individual agents' actions.
In this paper, we address this credit assignment problem with an approach that we call textitpartial reward decoupling (PRD)
PRD decomposes large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment.
arXiv Detail & Related papers (2021-12-23T17:48:04Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Scaling Multi-Agent Reinforcement Learning with Selective Parameter
Sharing [4.855663359344748]
Sharing parameters in deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents.
However, having all agents share the same parameters can also have a detrimental effect on learning.
We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals.
arXiv Detail & Related papers (2021-02-15T11:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.