Related papers: Language-conditioned world model improves policy generalization by reading environmental descriptions

Language-conditioned world model improves policy generalization by reading environmental descriptions

URL: http://arxiv.org/abs/2511.22904v1
Date: Fri, 28 Nov 2025 06:13:27 GMT
Title: Language-conditioned world model improves policy generalization by reading environmental descriptions
Authors: Anh Nguyen, Stefan Lee,
Abstract summary: To interact effectively with humans in the real world, it is important for agents to understand language that describes the dynamics of the environment.<n>We propose a model-based reinforcement learning approach, where a language-conditioned world model is trained through interaction with the environment.<n>We show that policies trained with LED-WM generalize more effectively to unseen games described by novel dynamics and language.
Score: 20.07554058793324
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To interact effectively with humans in the real world, it is important for agents to understand language that describes the dynamics of the environment--that is, how the environment behaves--rather than just task instructions specifying "what to do". Understanding this dynamics-descriptive language is important for human-agent interaction and agent behavior. Recent work address this problem using a model-based approach: language is incorporated into a world model, which is then used to learn a behavior policy. However, these existing methods either do not demonstrate policy generalization to unseen games or rely on limiting assumptions. For instance, assuming that the latency induced by inference-time planning is tolerable for the target task or expert demonstrations are available. Expanding on this line of research, we focus on improving policy generalization from a language-conditioned world model while dropping these assumptions. We propose a model-based reinforcement learning approach, where a language-conditioned world model is trained through interaction with the environment, and a policy is learned from this model--without planning or expert demonstrations. Our method proposes Language-aware Encoder for Dreamer World Model (LED-WM) built on top of DreamerV3. LED-WM features an observation encoder that uses an attention mechanism to explicitly ground language descriptions to entities in the observation. We show that policies trained with LED-WM generalize more effectively to unseen games described by novel dynamics and language compared to other baselines in several settings in two environments: MESSENGER and MESSENGER-WM.To highlight how the policy can leverage the trained world model before real-world deployment, we demonstrate the policy can be improved through fine-tuning on synthetic test trajectories generated by the world model.

Related papers

From Word to World: Can Large Language Models be Implicit Text-based World Models? [82.47317196099907]
Agentic reinforcement learning increasingly relies on experience-driven scaling.<n>World models offer a potential way to improve learning efficiency through simulated experience.<n>We study whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents.
arXiv Detail & Related papers (2025-12-21T17:28:42Z)
Dynamic Context-Aware Scene Reasoning Using Vision-Language Alignment in Zero-Shot Real-World Scenarios [0.0]
This work introduces a Dynamic Context-Aware Scene Reasoning framework to address zero-shot real-world scenarios.<n>The proposed approach integrates pre-trained vision transformers and large language models to align visual semantics with natural language descriptions.<n>Experiments demonstrate up to 18% improvement in scene understanding accuracy over baseline models in complex and unseen environments.
arXiv Detail & Related papers (2025-10-30T15:07:55Z)
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines. We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z)
Language-Guided World Models: A Model-Based Approach to AI Control [31.9089380929602]
This paper introduces the concept of Language-Guided World Models (LWMs) LWMs are probabilistic models that can simulate environments by reading texts. We take initial steps in developing robust LWMs that can generalize to compositionally novel language descriptions.
arXiv Detail & Related papers (2024-01-24T03:11:36Z)
Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z)
Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions. We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z)
Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings. We demonstrate that this framework enables effective generalization across different environments. For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z)
Emergent Communication with World Models [80.55287578801008]
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it. We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks.
arXiv Detail & Related papers (2020-02-22T02:34:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.