Low-Rank Agent-Specific Adaptation (LoRASA) for Multi-Agent Policy Learning
- URL: http://arxiv.org/abs/2502.05573v1
- Date: Sat, 08 Feb 2025 13:57:53 GMT
- Title: Low-Rank Agent-Specific Adaptation (LoRASA) for Multi-Agent Policy Learning
- Authors: Beining Zhang, Aditya Kapoor, Mingfei Sun,
- Abstract summary: Multi-agent reinforcement learning (MARL) often relies on emph parameter sharing (PS) to scale efficiently.
We propose textbfLow-Rank Agent-Specific Adaptation (LoRASA), a novel approach that treats each agent's policy as a specialized task'' fine-tuned from a shared backbone.
We evaluate LoRASA on challenging benchmarks including the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo (MAMuJoCo)
- Score: 3.333453555166201
- License:
- Abstract: Multi-agent reinforcement learning (MARL) often relies on \emph{parameter sharing (PS)} to scale efficiently. However, purely shared policies can stifle each agent's unique specialization, reducing overall performance in heterogeneous environments. We propose \textbf{Low-Rank Agent-Specific Adaptation (LoRASA)}, a novel approach that treats each agent's policy as a specialized ``task'' fine-tuned from a shared backbone. Drawing inspiration from parameter-efficient transfer methods, LoRASA appends small, low-rank adaptation matrices to each layer of the shared policy, naturally inducing \emph{parameter-space sparsity} that promotes both specialization and scalability. We evaluate LoRASA on challenging benchmarks including the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo (MAMuJoCo), implementing it atop widely used algorithms such as MAPPO and A2PO. Across diverse tasks, LoRASA matches or outperforms existing baselines \emph{while reducing memory and computational overhead}. Ablation studies on adapter rank, placement, and timing validate the method's flexibility and efficiency. Our results suggest LoRASA's potential to establish a new norm for MARL policy parameterization: combining a shared foundation for coordination with low-rank agent-specific refinements for individual specialization.
Related papers
- SDPO: Segment-Level Direct Preference Optimization for Social Agents [56.970902914217156]
Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex goal-oriented social dialogues.
We propose Segment-Level Direct Preference Optimization (SDPO) to optimize multi-turn agent behavior while minimizing training noise.
arXiv Detail & Related papers (2025-01-03T14:09:46Z) - Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space [22.535906675532196]
In a multi-agent system, action semantics indicates the different influences of agents' actions toward other entities.
Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents.
We introduce the Unified Action Space (UAS) to fulfill the requirement.
arXiv Detail & Related papers (2024-08-14T09:15:11Z) - Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration [5.326588461041464]
Multi-agent reinforcement learning (MARL) is transforming fields like autonomous vehicle networks.
MARL strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks.
We propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO)
We show SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF)
arXiv Detail & Related papers (2024-04-05T03:02:57Z) - Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Learning to Coordinate in Multi-Agent Systems: A Coordinated
Actor-Critic Algorithm and Finite-Time Guarantees [43.10380224532313]
We study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm.
We propose and analyze a class of coordinated actor-critic algorithms (CAC) in which individually parametrized policies have a it shared part and a it personalized part.
This work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.
arXiv Detail & Related papers (2021-10-11T20:26:16Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.