Dyn-O: Building Structured World Models with Object-Centric Representations
- URL: http://arxiv.org/abs/2507.03298v1
- Date: Fri, 04 Jul 2025 05:06:15 GMT
- Title: Dyn-O: Building Structured World Models with Object-Centric Representations
- Authors: Zizhao Wang, Kaixin Wang, Li Zhao, Peter Stone, Jiang Bian,
- Abstract summary: We introduce Dyn-O, an enhanced structured world model built upon object-centric representations.<n>Compared to prior work in object-centric representations, Dyn-O improves in both learning representations and modeling dynamics.<n>We find that our method can learn object-centric world models directly from pixel observations, outperforming DreamerV3 in rollout prediction accuracy.
- Score: 42.65409148846005
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: World models aim to capture the dynamics of the environment, enabling agents to predict and plan for future states. In most scenarios of interest, the dynamics are highly centered on interactions among objects within the environment. This motivates the development of world models that operate on object-centric rather than monolithic representations, with the goal of more effectively capturing environment dynamics and enhancing compositional generalization. However, the development of object-centric world models has largely been explored in environments with limited visual complexity (such as basic geometries). It remains underexplored whether such models can generalize to more complex settings with diverse textures and cluttered scenes. In this paper, we fill this gap by introducing Dyn-O, an enhanced structured world model built upon object-centric representations. Compared to prior work in object-centric representations, Dyn-O improves in both learning representations and modeling dynamics. On the challenging Procgen games, we find that our method can learn object-centric world models directly from pixel observations, outperforming DreamerV3 in rollout prediction accuracy. Furthermore, by decoupling object-centric features into dynamics-agnostic and dynamics-aware components, we enable finer-grained manipulation of these features and generate more diverse imagined trajectories.
Related papers
- Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos [30.367498271886866]
We develop a neural dynamics framework that combines object particles and spatial grids in a hybrid representation.<n>We demonstrate that our model learns the dynamics of diverse objects from sparse-view RGB-D recordings of robot-object interactions.<n>Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views.
arXiv Detail & Related papers (2025-06-18T17:59:38Z) - SlotPi: Physics-informed Object-centric Reasoning Models [37.32107835829927]
We introduce SlotPi, a physics-informed object-centric reasoning model.<n>Our experiments highlight the model's strengths in tasks such as prediction and Visual Question Answering (VQA) on benchmark and fluid datasets.<n>We have created a real-world dataset encompassing object interactions, fluid dynamics, and fluid-object interactions, on which we validated our model's capabilities.
arXiv Detail & Related papers (2025-06-12T14:53:36Z) - Aether: Geometric-Aware Unified World Modeling [49.33579903601599]
Aether is a unified framework that enables geometry-aware reasoning in world models.<n>It achieves zero-shot generalization in both action following and reconstruction tasks.<n>We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling.
arXiv Detail & Related papers (2025-03-24T17:59:51Z) - Inter-environmental world modeling for continuous and compositional dynamics [7.01176359680407]
We introduce Lie Action, an unsupervised framework that learns continuous latent action representations to simulate across environments.<n>We demonstrate that WLA can be trained using only video frames and, with minimal or no action labels, can quickly adapt to new environments with novel action sets.
arXiv Detail & Related papers (2025-03-13T00:02:54Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.<n>voxelization infers per-object occupancy probabilities at individual spatial locations.<n>Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Unsupervised Dynamics Prediction with Object-Centric Kinematics [22.119612406160073]
We propose Object-Centric Kinematics (OCK), a framework for dynamics prediction leveraging object-centric representations.
OCK consists of low-level structured states of objects' position, velocity, and acceleration.
Our model demonstrates superior performance when handling objects and backgrounds in complex scenes characterized by a wide range of object attributes and dynamic movements.
arXiv Detail & Related papers (2024-04-29T04:47:23Z) - Learning Physical Dynamics for Object-centric Visual Prediction [7.395357888610685]
The ability to model the underlying dynamics of visual scenes and reason about the future is central to human intelligence.
This paper proposes an unsupervised object-centric prediction model that makes future predictions by learning visual dynamics between objects.
arXiv Detail & Related papers (2024-03-15T07:45:25Z) - Relational Object-Centric Actor-Critic [44.99833362998488]
Recent works highlight that disentangled object representations can aid policy learning in image-based, object-centric reinforcement learning tasks.<n>This paper proposes a novel object-centric reinforcement learning algorithm that integrates actor-critic and model-based approaches.<n>We evaluate our method in a simulated 3D robotic environment and a 2D environment with compositional structure.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric
Voxelization [67.85434518679382]
We present DynaVol, a 3D scene generative model that unifies geometric structures and object-centric learning.
The key idea is to perform object-centric voxelization to capture the 3D nature of the scene.
voxel features evolve over time through a canonical-space deformation function, forming the basis for global representation learning.
arXiv Detail & Related papers (2023-04-30T05:29:28Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.