Aether: Geometric-Aware Unified World Modeling
- URL: http://arxiv.org/abs/2503.18945v2
- Date: Tue, 25 Mar 2025 15:31:25 GMT
- Title: Aether: Geometric-Aware Unified World Modeling
- Authors: Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He,
- Abstract summary: Aether is a unified framework that enables geometry-aware reasoning in world models.<n>It achieves zero-shot generalization in both action following and reconstruction tasks.<n>We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling.
- Score: 49.33579903601599
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates unprecedented synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Remarkably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.
Related papers
- Attention to Detail: Fine-Scale Feature Preservation-Oriented Geometric Pre-training for AI-Driven Surrogate Modeling [6.34618828355523]
AI-driven surrogate modeling has become an increasingly effective alternative to physics-based simulations for 3D design, analysis, and manufacturing.
This work introduces a self-supervised geometric representation learning method designed to capture fine-scale geometric features from non-parametric 3D models.
arXiv Detail & Related papers (2025-04-27T17:10:13Z) - DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction [24.44253219419552]
One-shot view planning enables efficient data collection by predicting all views at once.
By conditioning on initial multi-view images, we exploit the priors from the 3D diffusion model to generate an approximate object model.
We validate the proposed active object reconstruction system through both simulation and real-world experiments.
arXiv Detail & Related papers (2025-04-16T00:14:52Z) - ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z) - A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion [16.379647695019308]
3D scene reconstruction is a foundational problem in computer vision.<n>We introduce the Gaussian Object Carver (GOC), a novel, efficient, and scalable framework for object-compositional 3D scene reconstruction.<n>GOC leverage 3D Gaussian Splatting (GS), enriched with monocular geometry priors and multi-view geometry regularization, to achieve high-quality and flexible reconstruction.
arXiv Detail & Related papers (2024-12-03T01:34:39Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video [13.510513575340106]
Reconstructing dynamic articulated objects from a singular monocular video is challenging, requiring joint estimation of shape, motion, and camera parameters from limited views.
We propose Synergistic Shape and Skeleton Optimization (S3O), a novel two-phase method that efficiently learns parametric models including visible shapes and underlying skeletons.
Our experimental evaluations on standard benchmarks and the PlanetZoo dataset affirm that S3O provides more accurate 3D reconstruction, and plausible skeletons, and reduces the training time by approximately 60% compared to the state-of-the-art.
arXiv Detail & Related papers (2024-05-21T09:01:00Z) - REACTO: Reconstructing Articulated Objects from a Single Video [64.89760223391573]
We propose a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints.
Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects.
arXiv Detail & Related papers (2024-04-17T08:01:55Z) - Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference [3.2229099973277076]
dissertation presents a fraction of contributions that advances 3D scene modeling to its state of the art.
In contrast to the prevailing deep learning methods, as a core contribution, this thesis aims to develop algorithms that follow first principles.
arXiv Detail & Related papers (2024-04-01T17:09:40Z) - Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning [24.44253219419552]
We propose a novel one-shot view planning approach that utilizes the powerful 3D generation capabilities of diffusion models as priors.
Our experiments in simulation and real-world setups indicate that our approach balances well between object reconstruction quality and movement cost.
arXiv Detail & Related papers (2024-03-25T14:21:49Z) - Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.
Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z) - MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis [26.710960922302124]
We propose a real-world Multi-Sensor Hybrid Room dataset (MuSHRoom)<n>Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices.<n>We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis.
arXiv Detail & Related papers (2023-11-05T21:46:12Z) - LIST: Learning Implicitly from Spatial Transformers for Single-View 3D
Reconstruction [5.107705550575662]
List is a novel neural architecture that leverages local and global image features to reconstruct geometric and topological structure of a 3D object from a single image.
We show the superiority of our model in reconstructing 3D objects from both synthetic and real-world images against the state of the art.
arXiv Detail & Related papers (2023-07-23T01:01:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.