Related papers: Next Embedding Prediction Makes World Models Stronger

Next Embedding Prediction Makes World Models Stronger

URL: http://arxiv.org/abs/2603.02765v1
Date: Tue, 03 Mar 2026 09:04:28 GMT
Title: Next Embedding Prediction Makes World Models Stronger
Authors: George Bredis, Nikita Balagansky, Daniil Gavrilov, Ruslan Rakhimov,
Abstract summary: We introduce NE-Dreamer, a decoder-free model-based reinforcement learning agent.<n>We use a temporal transformer to predict next-step encoder embeddings from latent state sequences.<n>On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents.
Score: 9.30425021795895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.

Related papers

PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks [70.1286354746363]
Spiking Neural Networks (SNNs) offer a natural platform for unsupervised representation learning.<n>Current unsupervised SNNs employ shallow architectures or localized plasticity rules, limiting their ability to model long-range temporal dependencies.<n>We propose PredNext, which explicitly models temporal relationships through cross-view future Step Prediction and Clip Prediction.
arXiv Detail & Related papers (2025-09-29T14:27:58Z)
A Time-Series Foundation Model by Universal Delay Embedding [4.221753069966852]
This study introduces Universal Delay Embedding (UDE), a pretrained foundation model designed to revolutionize time-series forecasting.<n>UDE as a dynamical representation of observed data constructs two-dimensional subspace patches from Hankel matrices.<n>In particular, the learned dynamical representations and Koopman operator prediction forms from the patches exhibit exceptional interpretability.
arXiv Detail & Related papers (2025-09-15T16:11:49Z)
Towards Efficient General Feature Prediction in Masked Skeleton Modeling [59.46799426434277]
We propose a novel General Feature Prediction framework (GFP) for efficient mask skeleton modeling.<n>Our key innovation is replacing conventional low-level reconstruction with high-level feature prediction that spans from local motion patterns to global semantic representations.
arXiv Detail & Related papers (2025-09-03T18:05:02Z)
T-SHRED: Symbolic Regression for Regularization and Model Discovery with Transformer Shallow Recurrent Decoders [2.8820361301109365]
SHallow REcurrent Decoders (SHRED) are effective for system identification and forecasting from sparse sensor measurements.<n>We improve SHRED by leveraging transformers (T-SHRED) for the temporal encoding which improves performance on next-step state prediction.<n> Symbolic regression improves model interpretability by learning and regularizing the dynamics of the latent space during training.
arXiv Detail & Related papers (2025-06-18T21:14:38Z)
PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting [30.055634767677823]
In urban computing, precise and swift forecasting of time series data from traffic networks is crucial.<n>Current research limitations because of inherent inefficiency of model and their unsuitability for large-scale traffic applications due to model complexity.<n>This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP)<n>Our framework achieves comparable state-of-theart performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.
arXiv Detail & Related papers (2024-12-18T08:35:40Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction. Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations. We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains.<n>Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.<n>We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. We introduce a novel generative modeling framework grounded in textbfphase space dynamics Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z)
CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones. It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.