U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
- URL: http://arxiv.org/abs/2512.02982v1
- Date: Tue, 02 Dec 2025 17:59:57 GMT
- Title: U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
- Authors: Xiang Xu, Ao Liang, Youquan Liu, Linfeng Li, Lingdong Kong, Ziwei Liu, Qingshan Liu,
- Abstract summary: Existing generative frameworks treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes.<n>We present U4D, an uncertainty-aware framework for 4D LiDAR world modeling.<n>Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions.<n>It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors.
- Score: 54.77163447282599
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.
Related papers
- Orthogonal Spatial-temporal Distributional Transfer for 4D Generation [58.30004328671699]
We propose a framework that transfers rich spatial priors from existing 3D diffusion models and temporal priors from video diffusion models to enhance 4D synthesis.<n>We develop a spatial-temporal-disentangled 4D (STD-4D) Diffusion model, which synthesizes 4D-aware videos through disentangled spatial and temporal latents.<n> Experiments demonstrate that our method significantly outperforms existing approaches, achieving superior spatial-temporal consistency and higher-quality 4D synthesis.
arXiv Detail & Related papers (2026-03-05T11:52:21Z) - RAYNOVA: Scale-Temporal Autoregressive World Modeling in Ray Space [51.441415833480505]
RAYNOVA is a multiview world model for driving scenarios that employs a dual-causal autoregressive framework.<n>It constructs an isotropic-temporal representation across views, frames, and scales based on relative Plcker-ray positional encoding.
arXiv Detail & Related papers (2026-02-24T08:41:40Z) - Spatial-Temporal State Propagation Autoregressive Model for 4D Object Generation [19.913442608499366]
A spatial-temporal state propagation autoregressive model (STAR) is proposed, which achieves spatial-temporal consistent generation.<n>Experiments demonstrate that 4DSTAR generates spatial-temporal consistent 4D objects, and achieves performance competitive with diffusion models.
arXiv Detail & Related papers (2026-02-21T13:21:21Z) - SS4D: Native 4D Generative Model via Structured Spacetime Latents [50.29500511908054]
We present SS4D, a native 4D generative model that synthesizes dynamic 3D objects directly from monocular video.<n>We train a generator directly on 4D data, achieving high fidelity, temporal coherence, and structural consistency.
arXiv Detail & Related papers (2025-12-16T10:45:06Z) - 4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation [28.11338918279445]
We propose a novel 4D generation network called 4DSTR, which modulates generative 4D Gaussian Splatting with spatial-temporal rectification.<n>Experiments demonstrate that our 4DSTR achieves state-of-the-art performance in video-to-4D generation, excelling in reconstruction quality, spatial-temporal consistency, and adaptation to rapid temporal movements.
arXiv Detail & Related papers (2025-11-10T15:57:03Z) - 3D and 4D World Modeling: A Survey [104.20852751473392]
World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit.<n>We introduce a structured taxonomy spanning video-based (VideoGen), occupancy-based (OccGen), and LiDAR-based (LiDARGen) approaches.<n>We discuss practical applications, identify open challenges, and highlight promising research directions.
arXiv Detail & Related papers (2025-09-04T17:59:58Z) - Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation [3.1852855132066673]
Current approaches often struggle to maintain view consistency while handling complex scene dynamics.<n>This framework is the first to leverage both rich temporal priors video diffusion models and geometric awareness of the reconstruction models.<n>It significantly facilitates 4D generation and shows higher quality (e.g., mPSNR, mSSIM) over existing methods.
arXiv Detail & Related papers (2025-08-11T08:55:47Z) - DSG-World: Learning a 3D Gaussian World Model from Dual State Videos [14.213608866611784]
We present DSG-World, a novel end-to-end framework that explicitly constructs a 3D Gaussian World model from Dual State observations.<n>Our approach builds dual segmentation-aware Gaussian fields and enforces bidirectional photometric and semantic consistency.
arXiv Detail & Related papers (2025-06-05T16:33:32Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z) - DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation [50.01520547454224]
Current generative models struggle to synthesize 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS)<n>We propose DiST-4D, which disentangles the problem into two diffusion processes: DiST-T, which predicts future metric depth and multi-view RGB sequences directly from past observations, and DiST-S, which enables spatial NVS by training only on existing viewpoints while enforcing cycle consistency.<n>Experiments demonstrate that DiST-4D achieves state-of-the-art performance in both temporal prediction and NVS tasks, while also delivering competitive performance in planning-related evaluations.
arXiv Detail & Related papers (2025-03-19T13:49:48Z) - LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human
Modeling [69.56581851211841]
We propose a novel Local 4D implicit Representation for Dynamic clothed human, named LoRD.
Our key insight is to encourage the network to learn the latent codes of local part-level representation.
LoRD has strong capability for representing 4D human, and outperforms state-of-the-art methods on practical applications.
arXiv Detail & Related papers (2022-08-18T03:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.