Related papers: A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference

A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference

URL: http://arxiv.org/abs/2503.17803v1
Date: Sat, 22 Mar 2025 15:49:13 GMT
Title: A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference
Authors: Giovanni Briglia, Stefano Mariani, Franco Zambonelli,
Abstract summary: Causal reasoning is increasingly used in Reinforcement Learning (RL) to improve the learning process.<n>However, applications of causal reasoning to Multi-Agent RL (MARL) are still mostly unexplored.<n>We take the first step in investigating the opportunities and challenges of applying causal reasoning in MARL.
Score: 0.24578723416255746
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Causal reasoning is increasingly used in Reinforcement Learning (RL) to improve the learning process in several dimensions: efficacy of learned policies, efficiency of convergence, generalisation capabilities, safety and interpretability of behaviour. However, applications of causal reasoning to Multi-Agent RL (MARL) are still mostly unexplored. In this paper, we take the first step in investigating the opportunities and challenges of applying causal reasoning in MARL. We measure the impact of a simple form of causal augmentation in state-of-the-art MARL scenarios increasingly requiring cooperation, and with state-of-the-art MARL algorithms exploiting various degrees of collaboration between agents. Then, we discuss the positive as well as negative results achieved, giving us the chance to outline the areas where further research may help to successfully transfer causal RL to the multi-agent setting.

Related papers

Reasoning with Exploration: An Entropy Perspective on Reinforcement Learning for LLMs [112.40801692473723]
Balancing exploration and exploitation is a central goal in reinforcement learning (RL)<n>We introduce a minimal modification to standard RL with only one line of code: augmenting the advantage function with an entropy-based term.<n>Our method achieves significant gains on the Pass@K metric, even when evaluated with extremely large K values.
arXiv Detail & Related papers (2025-06-17T17:54:03Z)
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO [91.25793883692036]
We aim to incentivize the reasoning ability of Multimodal Large Language Models (MLLMs) via reinforcement learning (RL)<n>We propose Share-GRPO, a novel RL approach that tackle these issues by exploring and sharing diverse reasoning trajectories over expanded question space.<n>In addition, Share-GRPO also shares reward information during advantage computation, which estimates solution advantages hierarchically across and within question variants.
arXiv Detail & Related papers (2025-05-22T13:39:32Z)
A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning. We analyze scaling in reasoning steps that improves multi-step inference and logical consistency. We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [54.787341008881036]
We introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors.<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Regularized Multi-LLMs Collaboration for Enhanced Score-based Causal Discovery [13.654021365091305]
We explore the potential of using large language models (LLMs) to enhance causal discovery approaches. We propose a general framework to utilise the capacity of not only one but multiple LLMs to augment the discovery process.
arXiv Detail & Related papers (2024-11-27T01:56:21Z)
Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL) Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z)
Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL) Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z)
Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning [18.054709749075194]
We propose a novel MARL algorithm named Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning (SCIC) Our approach aims to detect inter-agent causal influences in specific situations based on the criterion using causal intervention and conditional mutual information. The resulting update links coordinated exploration and intrinsic reward distribution, which enhance overall collaboration and performance.
arXiv Detail & Related papers (2023-12-15T05:09:32Z)
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [55.66353783572259]
Causal-Consistency Chain-of-Thought harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models.<n>Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations.
arXiv Detail & Related papers (2023-08-23T04:59:21Z)
Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z)
A Survey on Causal Reinforcement Learning [41.645270300009436]
We offer a review of Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR)
arXiv Detail & Related papers (2023-02-10T12:25:08Z)
Causal Multi-Agent Reinforcement Learning: Review and Open Problems [5.0519220616720295]
This paper serves to introduce the reader to the field of multi-agent reinforcement learning (MARL) We highlight key challenges in MARL and discuss these in the context of how causal methods may assist in tackling them.
arXiv Detail & Related papers (2021-11-12T13:44:31Z)
KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning [16.167201058368303]
We present an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called KnowSR. We employ the idea of knowledge distillation (KD) to share knowledge among agents to shorten the training phase. To empirically demonstrate the robustness and effectiveness of KnowSR, we performed extensive experiments on state-of-the-art MARL algorithms in collaborative and competitive scenarios.
arXiv Detail & Related papers (2021-05-25T02:19:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.