Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
- URL: http://arxiv.org/abs/2601.22647v1
- Date: Fri, 30 Jan 2026 07:06:40 GMT
- Title: Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
- Authors: Jinwoo Jang, Minjong Yoo, Sihyung Yoon, Honguk Woo,
- Abstract summary: Test-time Mixture of World Models (TMoW) is a framework that enhances adaptability to unseen and evolving domains.<n>TMoW updates its routing function over world models at test time, unlike conventional MoE where the function remains fixed.<n>We evaluate TMoW on VirtualHome, ALFWorld, and RLBench benchmarks, demonstrating strong performance in both zero-shot adaptation and few-shot expansion scenarios.
- Score: 29.514831254621438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language model (LM)-based embodied agents are increasingly deployed in real-world settings. Yet, their adaptability remains limited in dynamic environments, where constructing accurate and flexible world models is crucial for effective reasoning and decision-making. To address this challenge, we extend the Mixture-of-Experts (MoE) paradigm to embodied agents. While conventional MoE architectures modularize knowledge into expert components with pre-trained routing, they remain rigid once deployed, making them less effective for adapting to unseen domains in dynamic environments. We therefore propose Test-time Mixture of World Models (TMoW), a framework that enhances adaptability to unseen and evolving domains. TMoW updates its routing function over world models at test time, unlike conventional MoE where the function remains fixed, enabling agents to recombine existing models and integrate new ones for continual adaptation. It achieves this through (i) multi-granular prototype-based routing, which adapts mixtures across object- to scene-level similarities, (ii) test-time refinement that aligns unseen domain features with prototypes during inference, and (iii) distilled mixture-based augmentation, which efficiently constructs new models from few-shot data and existing prototypes. We evaluate TMoW on VirtualHome, ALFWorld, and RLBench benchmarks, demonstrating strong performance in both zero-shot adaptation and few-shot expansion scenarios, and showing that it enables embodied agents to operate effectively in dynamic environments.
Related papers
- Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems [38.4555621948915]
Prismatic World Model (PRISM-WM) is designed to decompose complex hybrid dynamics into composable primitives.<n>PRISM-WM significantly reduces rollout drift by accurately modeling sharp mode transitions in system dynamics.
arXiv Detail & Related papers (2025-12-09T09:40:34Z) - DyMoDreamer: World Modeling with Dynamic Modulation [52.27044216359359]
A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions.<n>We introduce DyMoDreamer, a novel algorithm that incorporates a dynamic modulation mechanism to improve the extraction of dynamic features and enrich the temporal information.<n>Experiments demonstrate that DyMoDreamer sets a new state-of-the-art on the Atari $100$k benchmark with a $156.6$% mean human-normalized score.
arXiv Detail & Related papers (2025-09-29T13:54:42Z) - World Model Implanting for Test-time Adaptation of Embodied Agents [29.514831254621438]
In embodied AI, a persistent challenge is enabling agents to robustly adapt to novel domains without requiring extensive data collection or retraining.<n>We present a world model implanting framework (WorMI) that combines the reasoning capabilities of large language models with independently learned, domain-specific world models.<n>We evaluate our WorMI on the VirtualHome and ALFWorld benchmarks, demonstrating superior zero-shot and few-shot performance compared to several LLM-based approaches.
arXiv Detail & Related papers (2025-09-04T07:32:16Z) - Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective [54.77404771454794]
We develop a flexible and robust world model for Multi-Agent Reinforcement Learning (MARL) using diffusion models.<n>Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks.
arXiv Detail & Related papers (2025-05-27T09:11:38Z) - Learning Transformer-based World Models with Contrastive Predictive Coding [58.0159270859475]
We show that the next state prediction objective is insufficient to fully exploit the representation capabilities of Transformers.<n>We propose to extend world model predictions to longer time horizons by introducing TWISTER, a world model using action-conditioned Contrastive Predictive Coding.<n>TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search.
arXiv Detail & Related papers (2025-03-06T13:18:37Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - Masked Generative Priors Improve World Models Sequence Modelling Capabilities [23.48066383072968]
Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling.<n>GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark.<n>We apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research.
arXiv Detail & Related papers (2024-10-10T11:52:07Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.35361897941898]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.<n>We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.<n>Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model [10.36125908359289]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.<n>Our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation [48.039156140237615]
A Continual Test-Time Adaptation task is proposed to adapt the pre-trained model to continually changing target domains.
We design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-shared knowledge.
Our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-06-07T11:18:53Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.