In-Context Reinforcement Learning via Communicative World Models
- URL: http://arxiv.org/abs/2508.06659v1
- Date: Fri, 08 Aug 2025 19:23:23 GMT
- Title: In-Context Reinforcement Learning via Communicative World Models
- Authors: Fernando Martinez-Lopez, Tao Li, Yingdong Lu, Juntao Chen,
- Abstract summary: This work formulates ICRL as a two-agent emergent communication problem.<n>It introduces CORAL, a framework that learns a transferable communicative context.<n>Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency.
- Score: 49.00028802135605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by decoupling latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not to maximize task reward, but to build a world model and distill its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in entirely unseen sparse-reward environments, validating the efficacy of learning a transferable communicative representation.
Related papers
- SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning [46.70182219204539]
We introduce SpeakRL, a reinforcement learning (RL) method that enhances agents' conversational capabilities by rewarding proactive interactions with users.<n>We present a systematic analysis of reward design for conversational proactivity and propose a principled reward formulation for teaching agents to balance asking with acting.
arXiv Detail & Related papers (2025-12-15T10:08:53Z) - Learning to Interact in World Latent for Team Coordination [53.51290193631586]
This work presents a novel representation learning framework, interactive world latent (IWoL), to facilitate team coordination in multi-agent reinforcement learning (MARL)<n>Our key insight is to construct a learnable representation space that jointly captures inter-agent relations and task-specific world information by directly modeling communication protocols.<n>Our representation can be used not only as an implicit latent for each agent, but also as an explicit message for communication.
arXiv Detail & Related papers (2025-09-29T22:13:39Z) - Training a Generally Curious Agent [86.84089201249104]
Paprika is a fine-tuning approach that enables language models to develop general decision-making capabilities.<n>Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates.<n>Results suggest a promising path towards AI systems that can autonomously solve sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z) - Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents [6.402396836189286]
We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning.<n>We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations.<n>In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
arXiv Detail & Related papers (2024-12-16T06:53:00Z) - DCMAC: Demand-aware Customized Multi-Agent Communication via Upper Bound Training [9.068971933560416]
We propose a Demand-aware Customized Multi-Agent Communication protocol, which use an upper bound training to obtain the ideal policy.<n> Experimental results reveal that DCMAC significantly outperforms the baseline algorithms in both unconstrained and communication constrained scenarios.
arXiv Detail & Related papers (2024-09-11T09:23:27Z) - On the Role of Emergent Communication for Social Learning in Multi-Agent
Reinforcement Learning [0.0]
Social learning uses cues from experts to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks.
This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility.
arXiv Detail & Related papers (2023-02-28T03:23:27Z) - Universally Expressive Communication in Multi-Agent Reinforcement
Learning [6.086083595135936]
We consider the question of whether a given communication protocol can express an arbitrary policy.
With standard GNN approaches provably limited in their expressive capacity, we consider augmenting agent observations with: (1) unique agent IDs and (2) random noise.
We provide a theoretical analysis as to how these approaches yield universally expressive communication, and also prove them capable of targeting arbitrary sets of actions for identical agents.
arXiv Detail & Related papers (2022-06-14T11:16:33Z) - Provable Benefits of Representational Transfer in Reinforcement Learning [59.712501044999875]
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation.
We show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy.
arXiv Detail & Related papers (2022-05-29T04:31:29Z) - Common Language for Goal-Oriented Semantic Communications: A Curriculum
Learning Framework [66.81698651016444]
A comprehensive semantic communications framework is proposed for enabling goal-oriented task execution.
A novel top-down framework that combines curriculum learning (CL) and reinforcement learning (RL) is proposed to solve this problem.
Simulation results show that the proposed CL method outperforms traditional RL in terms of convergence time, task execution time, and transmission cost during training.
arXiv Detail & Related papers (2021-11-15T19:13:55Z) - Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent
Populations [59.608216900601384]
We study agents that learn to communicate via actuating their joints in a 3D environment.
We show that under realistic assumptions, a non-uniform distribution of intents and a common-knowledge energy cost, these agents can find protocols that generalize to novel partners.
arXiv Detail & Related papers (2020-10-29T19:23:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.