Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
- URL: http://arxiv.org/abs/2602.01270v1
- Date: Sun, 01 Feb 2026 15:00:30 GMT
- Title: Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
- Authors: Boxuan Zhang, Weipu Zhang, Zhaohan Feng, Wei Xiao, Jian Sun, Jie Chen, Gang Wang,
- Abstract summary: We introduce Mixture-of-World Models (MoW), a scalable architecture that combines modular variational autoencoders for task-adaptive visual compression.<n>On the Atari 100k benchmark, a single MoW agent trained once on 26 Atari games achieves a mean human-normalized score of 110.4%.<n>MoW achieves a 74.5% average success rate within 300 thousand environment steps, establishing a new state of the art.
- Score: 39.854659234873644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental challenge in multi-task reinforcement learning (MTRL) is achieving sample efficiency in visual domains where tasks exhibit substantial heterogeneity in both observations and dynamics. Model-based reinforcement learning offers a promising path to improved sample efficiency through world models, but standard monolithic architectures struggle to capture diverse task dynamics, resulting in poor reconstruction and prediction accuracy. We introduce Mixture-of-World Models (MoW), a scalable architecture that combines modular variational autoencoders for task-adaptive visual compression, a hybrid Transformer-based dynamics model with task-conditioned experts and a shared backbone, and a gradient-based task clustering strategy for efficient parameter allocation. On the Atari 100k benchmark, a single MoW agent trained once on 26 Atari games achieves a mean human-normalized score of 110.4%, competitive with the score of 114.2% achieved by STORM, an ensemble of 26 task-specific models, while using 50% fewer parameters. On Meta-World, MoW achieves a 74.5% average success rate within 300 thousand environment steps, establishing a new state of the art. These results demonstrate that MoW provides a scalable and parameter-efficient foundation for generalist world models.
Related papers
- DyMoDreamer: World Modeling with Dynamic Modulation [52.27044216359359]
A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions.<n>We introduce DyMoDreamer, a novel algorithm that incorporates a dynamic modulation mechanism to improve the extraction of dynamic features and enrich the temporal information.<n>Experiments demonstrate that DyMoDreamer sets a new state-of-the-art on the Atari $100$k benchmark with a $156.6$% mean human-normalized score.
arXiv Detail & Related papers (2025-09-29T13:54:42Z) - TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents [1.6574413179773757]
We present a novel approach to knowledge transfer in model-based reinforcement learning.<n>Our method efficiently distills a high-capacity multi-task agent into a compact model.<n>Our approach addresses practical deployment limitations and offers insights into knowledge representation in large world models.
arXiv Detail & Related papers (2025-07-02T15:38:49Z) - Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning [2.3964255330849356]
We present the Multi-Agent Transformer World Model (MATWM), a novel transformer-based world model for reinforcement learning.<n>MATWM combines a decentralized imagination framework with a semi-centralized critic and a teammate prediction module.<n>We evaluate MATWM on a broad suite of benchmarks, including the StarCraft Multi-Agent Challenge, PettingZoo, and MeltingPot.
arXiv Detail & Related papers (2025-06-23T11:47:17Z) - Learning Transformer-based World Models with Contrastive Predictive Coding [58.0159270859475]
We show that the next state prediction objective is insufficient to fully exploit the representation capabilities of Transformers.<n>We propose to extend world model predictions to longer time horizons by introducing TWISTER, a world model using action-conditioned Contrastive Predictive Coding.<n>TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search.
arXiv Detail & Related papers (2025-03-06T13:18:37Z) - HeteroTune: Efficient Federated Learning for Large Heterogeneous Models [35.53420882449293]
We propose HeteroTune, a novel federated fine-tuning paradigm for large, heterogeneous models operating under limited communication and budgets.<n>The core of our method lies in a novel architecture, DeMA, which enables flexible and efficient aggregation of heterogeneous models.<n>We provide both theoretical analysis and empirical evidence showing that HeteroTune achieves state-of-the-art performance and efficiency across diverse tasks and model architectures.
arXiv Detail & Related papers (2024-11-25T09:58:51Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - VDFD: Multi-Agent Value Decomposition Framework with Disentangled World Model [10.36125908359289]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.<n>Our method achieves high sample efficiency and exhibits superior performance compared to other baselines across a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-09-08T22:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.