3D and 4D World Modeling: A Survey
- URL: http://arxiv.org/abs/2509.07996v2
- Date: Thu, 11 Sep 2025 15:54:05 GMT
- Title: 3D and 4D World Modeling: A Survey
- Authors: Lingdong Kong, Wesley Yang, Jianbiao Mei, Youquan Liu, Ao Liang, Dekai Zhu, Dongyue Lu, Wei Yin, Xiaotao Hu, Mingkai Jia, Junyuan Deng, Kaiwen Zhang, Yang Wu, Tianyi Yan, Shenyuan Gao, Song Wang, Linfeng Li, Liang Pan, Yong Liu, Jianke Zhu, Wei Tsang Ooi, Steven C. H. Hoi, Ziwei Liu,
- Abstract summary: World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit.<n>We introduce a structured taxonomy spanning video-based (VideoGen), occupancy-based (OccGen), and LiDAR-based (LiDARGen) approaches.<n>We discuss practical applications, identify open challenges, and highlight promising research directions.
- Score: 104.20852751473392
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit. While prior work largely emphasizes generative methods for 2D image and video data, they overlook the rapidly growing body of work that leverages native 3D and 4D representations such as RGB-D imagery, occupancy grids, and LiDAR point clouds for large-scale scene modeling. At the same time, the absence of a standardized definition and taxonomy for ``world models'' has led to fragmented and sometimes inconsistent claims in the literature. This survey addresses these gaps by presenting the first comprehensive review explicitly dedicated to 3D and 4D world modeling and generation. We establish precise definitions, introduce a structured taxonomy spanning video-based (VideoGen), occupancy-based (OccGen), and LiDAR-based (LiDARGen) approaches, and systematically summarize datasets and evaluation metrics tailored to 3D/4D settings. We further discuss practical applications, identify open challenges, and highlight promising research directions, aiming to provide a coherent and foundational reference for advancing the field. A systematic summary of existing literature is available at https://github.com/worldbench/survey
Related papers
- 3D Shape Generation: A Survey [0.6445605125467574]
Recent advances in deep learning have transformed the field of 3D shape generation.<n>This survey organizes the discussion around three core components: shape representations, generative modeling approaches, and evaluation protocols.<n>We identify open challenges and outline future research directions that could drive progress in controllable, efficient, and high-quality 3D shape generation.
arXiv Detail & Related papers (2025-06-27T23:06:06Z) - E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models [78.1674905950243]
We present the first comprehensive benchmark for 3D geometric foundation models (GFMs)<n>GFMs directly predict dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters.<n>We evaluate 16 state-of-the-art GFMs, revealing their strengths and limitations across tasks and domains.<n>All code, evaluation scripts, and processed data will be publicly released to accelerate research in 3D spatial intelligence.
arXiv Detail & Related papers (2025-06-02T17:53:09Z) - 3D Scene Generation: A Survey [41.202497008985425]
3D scene generation seeks to synthesize spatially structured, semantically meaningful, and photorealistic environments for applications such as immersive media, robotics, autonomous driving, and embodied AI.<n>This review organizes recent advances in 3D scene generation and highlights promising directions at the intersection of generative AI, 3D vision, and embodied intelligence.
arXiv Detail & Related papers (2025-05-08T17:59:54Z) - TesserAct: Learning 4D Embodied World Models [66.8519958275311]
We learn a 4D world model by training on RGB-DN (RGB, Depth, and Normal) videos.<n>This not only surpasses traditional 2D models by incorporating detailed shape, configuration, and temporal changes into their predictions, but also allows us to effectively learn accurate inverse dynamic models for an embodied agent.
arXiv Detail & Related papers (2025-04-29T17:59:30Z) - Advances in 4D Generation: A Survey [23.041037534410773]
4D generation enables richer interactive and immersive experiences.<n>Despite rapid progress, the field lacks a unified understanding of 4D representations, generative frameworks, basic paradigms, and the core technical challenges it faces.<n>This survey provides a systematic and in-depth review of the 4D generation landscape.
arXiv Detail & Related papers (2025-03-18T17:59:51Z) - Simulating the Real World: A Unified Survey of Multimodal Generative Models [48.35284571052435]
We present a unified survey for multimodal generative models that investigate the progression of data dimensionality in real-world simulation.<n>To the best of our knowledge, this is the first attempt to systematically unify the study of 2D, video, 3D and 4D generation within a single framework.
arXiv Detail & Related papers (2025-03-06T17:31:43Z) - Advances in 3D Generation: A Survey [54.95024616672868]
The field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models.
Specifically, we introduce the 3D representations that serve as the backbone for 3D generation.
We provide a comprehensive overview of the rapidly growing literature on generation methods, categorized by the type of algorithmic paradigms.
arXiv Detail & Related papers (2024-01-31T13:06:48Z) - Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances.
Recently, researchers started to shift focus from 2D to 3D space.
representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.