Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments
- URL: http://arxiv.org/abs/2601.01075v1
- Date: Sat, 03 Jan 2026 05:22:27 GMT
- Title: Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments
- Authors: Hansen Jin Lillemark, Benhao Huang, Fangneng Zhan, Yilun Du, Thomas Anderson Keller,
- Abstract summary: Embodied systems experience the world as 'a symphony of flows'<n>Most neural network world models ignore this structure and instead repeatedly re-learn the same transformations from data.<n>We introduce 'Flow Equivariant World Models', a framework in which both self-motion and external object motion are unified as one- parameter Lie group 'flows'
- Score: 54.23746358078753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external objects. These streams obey smooth, time-parameterized symmetries, which combine through a precisely structured algebra; yet most neural network world models ignore this structure and instead repeatedly re-learn the same transformations from data. In this work, we introduce 'Flow Equivariant World Models', a framework in which both self-motion and external object motion are unified as one-parameter Lie group 'flows'. We leverage this unification to implement group equivariance with respect to these transformations, thereby providing a stable latent world representation over hundreds of timesteps. On both 2D and 3D partially observed video world modeling benchmarks, we demonstrate that Flow Equivariant World Models significantly outperform comparable state-of-the-art diffusion-based and memory-augmented world modeling architectures -- particularly when there are predictable world dynamics outside the agent's current field of view. We show that flow equivariance is particularly beneficial for long rollouts, generalizing far beyond the training horizon. By structuring world model representations with respect to internal and external motion, flow equivariance charts a scalable route to data efficient, symmetry-guided, embodied intelligence. Project link: https://flowequivariantworldmodels.github.io.
Related papers
- FlowNet: Modeling Dynamic Spatio-Temporal Systems via Flow Propagation [43.89691389856747]
Accurately modeling complex dynamic-temporal systems requires capturing flow-mediated interdependencies and context-sensitive interaction dynamics.<n>Existing methods, predominantly graph-based or attention-driven, rely on similarity-driven connectivity assumptions, asymmetric flow exchanges that govern system evolution.<n>We propose Spatio-Temporal Flow, a physics-inspired paradigm that explicitly coupling models dynamic node transfers through quantifiable flow transfers governed by conservation principles.<n> Experiments demonstrate that FlowNet significantly outperforms existing state-of-the-art approaches on seven metrics in the modeling of three real-world systems, validating its efficiency and physical interpretability.
arXiv Detail & Related papers (2025-11-05T14:06:19Z) - On the flow matching interpretability [2.816392009888047]
We propose a framework constraining each flow step to be sampled from a known physical distribution.<n>Flow trajectories are mapped to (and constrained to traverse) the equilibrium states of the simulated physical process.<n>This demonstrates that embedding physical semantics into generative flows transforms neural trajectories into interpretable physical processes.
arXiv Detail & Related papers (2025-10-24T07:26:45Z) - SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models [42.814012901180774]
textbfSAMPO is a hybrid framework that combines visual autoregressive modeling for intra-frame generation with causal modeling for next-frame generation.<n>We show that SAMPO achieves competitive performance in action-conditioned video prediction and model-based control.<n>We also evaluate SAMPO's zero-shot generalization and scaling behavior, demonstrating its ability to generalize to unseen tasks.
arXiv Detail & Related papers (2025-09-19T02:41:37Z) - Kuramoto Orientation Diffusion Models [67.0711709825854]
Orientation-rich images, such as fingerprints and textures, often exhibit coherent angular patterns.<n>Motivated by the role of phase synchronization in biological systems, we propose a score-based generative model.<n>We implement competitive results on general image benchmarks and significantly improves generation quality on orientation-dense datasets like fingerprints and textures.
arXiv Detail & Related papers (2025-09-18T18:18:49Z) - Flow Equivariant Recurrent Neural Networks [2.900810893770134]
In machine learning, neural network architectures that respect symmetries of their data are called equivariant.<n>We extend equivariant network theory to this regime of flows', capturing natural transformations over time.<n>We show that these models significantly outperform their non-equivariant counterparts in terms of training speed, length generalization, and velocity generalization.
arXiv Detail & Related papers (2025-07-20T02:52:21Z) - Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective [54.77404771454794]
We develop a flexible and robust world model for Multi-Agent Reinforcement Learning (MARL) using diffusion models.<n>Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks.
arXiv Detail & Related papers (2025-05-27T09:11:38Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [53.70278210626701]
We propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images.<n>Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame.<n>We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches.
arXiv Detail & Related papers (2025-05-08T17:59:47Z) - EvoFed: Leveraging Evolutionary Strategies for Communication-Efficient
Federated Learning [15.124439914522693]
Federated Learning (FL) is a decentralized machine learning paradigm that enables collaborative model training across dispersed nodes.
This paper presents EvoFed, a novel approach that integrates Evolutionary Strategies (ES) with FL to address these challenges.
arXiv Detail & Related papers (2023-11-13T17:25:06Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Learning Robust Dynamics through Variational Sparse Gating [18.476155786474358]
In environments with many objects, often only a small number of them are moving or interacting at the same time.
In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels.
arXiv Detail & Related papers (2022-10-21T02:56:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.