Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings
- URL: http://arxiv.org/abs/2602.12520v1
- Date: Fri, 13 Feb 2026 01:57:21 GMT
- Title: Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings
- Authors: Zhizun Wang, David Meger,
- Abstract summary: We present a novel model-based multi-agent reinforcement learning framework.<n>We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding.<n>By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes.
- Score: 10.36125908359289
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning to coordinate many agents in partially observable and highly dynamic environments requires both informative representations and data-efficient training. To address this challenge, we present a novel model-based multi-agent reinforcement learning framework that unifies joint state-action representation learning with imaginative roll-outs. We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding (SALE). SALE is injected into both the imagination module that forecasts plausible future roll-outs and the joint agent network whose individual action values are combined through a mixing network to estimate the joint action-value function. By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes, leading to improved long-term planning and optimization under limited real-environment interactions. Empirical studies on well-established multi-agent benchmarks, including StarCraft II Micro-Management, Multi-Agent MuJoCo, and Level-Based Foraging challenges, demonstrate consistent gains of our method over baseline algorithms and highlight the effectiveness of joint state-action learned embeddings within a multi-agent model-based paradigm.
Related papers
- Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality [59.651410243721045]
CoCoA is a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization.<n>We introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding EOS> embeddings.<n>Experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality.
arXiv Detail & Related papers (2026-03-02T05:34:45Z) - MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning [53.37068897861388]
MedSAM-Agent is a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process.<n>We develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification.<n>Experiments across 6 medical modalities and 21 datasets demonstrate that MedSAM-Agent achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-03T09:47:49Z) - Scalable Multiagent Reinforcement Learning with Collective Influence Estimation [5.050035210247092]
This paper proposes a multiagent learning framework augmented with a Collective Influence Estimation Network.<n>By explicitly modeling the collective influence of other agents on the task object, each agent can infer critical interaction information.<n> Experimental results show that the proposed method achieves stable and efficient coordination under communication-limited environments.
arXiv Detail & Related papers (2026-01-13T04:24:11Z) - Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z) - Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective [54.77404771454794]
We develop a flexible and robust world model for Multi-Agent Reinforcement Learning (MARL) using diffusion models.<n>Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks.
arXiv Detail & Related papers (2025-05-27T09:11:38Z) - Cooperative Multi-Agent Planning with Adaptive Skill Synthesis [16.228784877899976]
We present a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making.<n>The skill library, bootstrapped from demonstrations, evolves via planner-guided tasks to enable adaptive strategies.<n>We demonstrate its strong performance against state-of-the-art MARL baselines across both symmetric and asymmetric scenarios.
arXiv Detail & Related papers (2025-02-14T13:23:18Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model [10.36125908359289]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.<n>Our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - Learning in Cooperative Multiagent Systems Using Cognitive and Machine
Models [1.0742675209112622]
Multi-Agent Systems (MAS) are critical for many applications requiring collaboration and coordination with humans.
One major challenge is the simultaneous learning and interaction of independent agents in dynamic environments.
We propose three variants of Multi-Agent IBL models (MAIBL)
We demonstrate that the MAIBL models exhibit faster learning and achieve better coordination in a dynamic CMOTP task with various settings of rewards compared to current MADRL models.
arXiv Detail & Related papers (2023-08-18T00:39:06Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.