Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments
- URL: http://arxiv.org/abs/2507.13846v1
- Date: Fri, 18 Jul 2025 11:59:55 GMT
- Title: Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments
- Authors: Kathrin Korte, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese,
- Abstract summary: Multi-agent reinforcement learning (MARL) has achieved notable success in environments where agents must learn coordinated behaviors.<n>Traditional knowledge transfer methods in MARL struggle to generalize, and agents often require costly retraining to adapt.<n>This paper introduces a causal knowledge transfer framework that enables RL agents to learn and share compact causal representations of paths within a non-stationary environment.
- Score: 1.2787026473187368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: [Context] Multi-agent reinforcement learning (MARL) has achieved notable success in environments where agents must learn coordinated behaviors. However, transferring knowledge across agents remains challenging in non-stationary environments with changing goals. [Problem] Traditional knowledge transfer methods in MARL struggle to generalize, and agents often require costly retraining to adapt. [Approach] This paper introduces a causal knowledge transfer framework that enables RL agents to learn and share compact causal representations of paths within a non-stationary environment. As the environment changes (new obstacles), agents' collisions require adaptive recovery strategies. We model each collision as a causal intervention instantiated as a sequence of recovery actions (a macro) whose effect corresponds to a causal knowledge of how to circumvent the obstacle while increasing the chances of achieving the agent's goal (maximizing cumulative reward). This recovery action macro is transferred online from a second agent and is applied in a zero-shot fashion, i.e., without retraining, just by querying a lookup model with local context information (collisions). [Results] Our findings reveal two key insights: (1) agents with heterogeneous goals were able to bridge about half of the gap between random exploration and a fully retrained policy when adapting to new environments, and (2) the impact of causal knowledge transfer depends on the interplay between environment complexity and agents' heterogeneous goals.
Related papers
- Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z) - Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z) - Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems [12.471774408499817]
Agentic AI aims to create systems that set their own goals, adapt proactively to change, and refine behavior through continuous experience.<n>Recent advances suggest that, when facing multiple and unforeseen tasks, agents could benefit from sharing machine-learned knowledge and reuse policies that have already been fully or partially learned by other agents.<n>This study explores how an agent decides what knowledge to select, from whom, and when and how to integrate it in its own policy in order to accelerate its own learning.
arXiv Detail & Related papers (2025-06-05T20:38:11Z) - Causal Mean Field Multi-Agent Reinforcement Learning [10.767740092703777]
A framework named mean-field reinforcement learning (MFRL) could alleviate the scalability problem by employing the Mean Field Theory.<n>This framework lacks the ability to identify essential interactions under nonstationary environments.<n>We propose an algorithm called causal mean-field Q-learning (CMFQ) to address the scalability problem.
arXiv Detail & Related papers (2025-02-20T02:15:58Z) - COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions.<n>Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans.<n>We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z) - Factorised Active Inference for Strategic Multi-Agent Interactions [1.9389881806157316]
Two complementary approaches can be integrated to this end.<n>The Active Inference framework (AIF) describes how agents employ a generative model to adapt their beliefs about and behaviour within their environment.<n>Game theory formalises strategic interactions between agents with potentially competing objectives.<n>We propose a factorisation of the generative model whereby each agent maintains explicit, individual-level beliefs about the internal states of other agents, and uses them for strategic planning in a joint context.
arXiv Detail & Related papers (2024-11-11T21:04:43Z) - Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations [22.6449779859417]
General intelligence requires quick adaption across tasks.<n>In this paper, we explore a wider range of scenarios where not only the distribution but also the environment spaces may change.<n>We introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively.
arXiv Detail & Related papers (2024-07-30T08:48:49Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Multi-Agent Transfer Learning in Reinforcement Learning-Based
Ride-Sharing Systems [3.7311680121118345]
Reinforcement learning (RL) has been used in a range of simulated real-world tasks.
In this paper we investigate the impact of TL transfer parameters with fixed source and target roles.
arXiv Detail & Related papers (2021-12-01T11:23:40Z) - Relative Distributed Formation and Obstacle Avoidance with Multi-agent
Reinforcement Learning [20.401609420707867]
We propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL)
Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines.
arXiv Detail & Related papers (2021-11-14T13:02:45Z) - Language-guided Navigation via Cross-Modal Grounding and Alternate
Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments.
The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments.
We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z) - RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles.
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents.
By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.