Zero-shot World Models via Search in Memory
- URL: http://arxiv.org/abs/2510.16123v1
- Date: Fri, 17 Oct 2025 18:09:36 GMT
- Title: Zero-shot World Models via Search in Memory
- Authors: Federico Malato, Ville Hautamäki,
- Abstract summary: We leverage similarity search and representations to approximate a world model without a training procedure.<n>We evaluate the models on the quality of latent reconstruction and on the perceived similarity of the reconstructed image.<n>Our model show stronger performance in long-horizon prediction with respect to the baseline on a range of visually different environments.
- Score: 7.15414423703749
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: World Models have vastly permeated the field of Reinforcement Learning. Their ability to model the transition dynamics of an environment have greatly improved sample efficiency in online RL. Among them, the most notorious example is Dreamer, a model that learns to act in a diverse set of image-based environments. In this paper, we leverage similarity search and stochastic representations to approximate a world model without a training procedure. We establish a comparison with PlaNet, a well-established world model of the Dreamer family. We evaluate the models on the quality of latent reconstruction and on the perceived similarity of the reconstructed image, on both next-step and long horizon dynamics prediction. The results of our study demonstrate that a search-based world model is comparable to a training based one in both cases. Notably, our model show stronger performance in long-horizon prediction with respect to the baseline on a range of visually different environments.
Related papers
- Co-Evolving Latent Action World Models [57.48921576959243]
Adapting pre-trained video models into controllable world models via latent actions is a promising step towards creating generalist world models.<n>We propose CoLA-World, which for the first time successfully realizes this synergistic paradigm.<n>This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM.
arXiv Detail & Related papers (2025-10-30T12:28:40Z) - Can World Models Benefit VLMs for World Dynamics? [59.73433292793044]
We investigate the capabilities when world model priors are transferred into Vision-Language Models.<n>We name our best-performing variant Dynamic Vision Aligner (DyVA)<n>We find DyVA to surpass both open-source and proprietary baselines, achieving state-of-the-art or comparable performance.
arXiv Detail & Related papers (2025-10-01T13:07:05Z) - Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models [37.774994737939394]
We use dynamics models to bootstrap world models using synthetic data and inference time verification.<n>Our best model achieves a performance competitive with state-of-the-art image editing models, improving on them by a margin of $15%$ on real-world subsets according to GPT4o-as-judge.
arXiv Detail & Related papers (2025-06-06T11:50:18Z) - World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks [55.90051810762702]
We present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning.<n>We propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization.
arXiv Detail & Related papers (2025-05-31T06:43:00Z) - Learning Local Causal World Models with State Space Models and Attention [1.5498250598583487]
We show that a SSM can model the dynamics of a simple environment and learn a causal model at the same time.<n>We pave the way for further experiments that lean into the strength of SSMs and further enhance them with causal awareness.
arXiv Detail & Related papers (2025-05-04T11:57:02Z) - Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [6.747704665696368]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions.<n>We show that performing the dreaming process inside the latent space allows for training with fewer environment steps.<n>We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z) - EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling [8.250616459360684]
We introduce EDELINE, a unified world model architecture that integrates state space models with diffusion models.<n>Our approach outperforms existing baselines across visually challenging Atari 100k tasks, memory-demanding benchmark, and 3D first-person ViZDoom environments.
arXiv Detail & Related papers (2025-02-01T15:49:59Z) - FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [66.61201445650323]
Existing methods suffer from a generalization bottleneck in real-world scenarios.<n>We contribute a million-scale dataset with two notable advantages over existing training data.<n>We propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios.
arXiv Detail & Related papers (2024-12-02T12:08:40Z) - Consistency-diversity-realism Pareto fronts of conditional image generative models [22.372033071088424]
We use state-of-the-art text-to-image and image-and-text-to-image models and their knobs to draw consistency-diversity-realism Pareto fronts.
Our experiments suggest that realism and consistency can both be improved simultaneously.
Our analysis shows that there is no best model and the choice of model should be determined by the downstream application.
arXiv Detail & Related papers (2024-06-14T22:14:11Z) - Diffusion for World Modeling: Visual Details Matter in Atari [22.915802013352465]
We introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.
We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance.
DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model.
arXiv Detail & Related papers (2024-05-20T22:51:05Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.