Learning to Generate 4D LiDAR Sequences
- URL: http://arxiv.org/abs/2509.11959v1
- Date: Mon, 15 Sep 2025 14:14:48 GMT
- Title: Learning to Generate 4D LiDAR Sequences
- Authors: Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, Huaici Zhao, Wei Tsang Ooi,
- Abstract summary: We present LiDARCrafter, a unified framework that converts free-form language into editable LiDAR sequences.<n>LiDARCrafter achieves state-of-the-art fidelity, controllability, and temporal consistency, offering a foundation for LiDAR-based simulation and data augmentation.
- Score: 28.411253849111755
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: While generative world models have advanced video and occupancy-based data synthesis, LiDAR generation remains underexplored despite its importance for accurate 3D perception. Extending generation to 4D LiDAR data introduces challenges in controllability, temporal stability, and evaluation. We present LiDARCrafter, a unified framework that converts free-form language into editable LiDAR sequences. Instructions are parsed into ego-centric scene graphs, which a tri-branch diffusion model transforms into object layouts, trajectories, and shapes. A range-image diffusion model generates the initial scan, and an autoregressive module extends it into a temporally coherent sequence. The explicit layout design further supports object-level editing, such as insertion or relocation. To enable fair assessment, we provide EvalSuite, a benchmark spanning scene-, object-, and sequence-level metrics. On nuScenes, LiDARCrafter achieves state-of-the-art fidelity, controllability, and temporal consistency, offering a foundation for LiDAR-based simulation and data augmentation.
Related papers
- LaGen: Towards Autoregressive LiDAR Scene Generation [66.95324368583536]
We introduce LaGen, which to the best of our knowledge is the first framework capable of frame-by-frame autoregressive generation of long-horizon LiDAR scenes.<n>LaGen is able to take a single-frame LiDAR input as a starting point and effectively utilize bounding box information as conditions to generate high-fidelity 4D scene point clouds.
arXiv Detail & Related papers (2025-11-26T10:39:16Z) - A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation [41.43267776407459]
Text-to-LiDAR generation can customize 3D data with rich structures and diverse scenes for downstream tasks.<n>However, the scarcity of Text-LiDAR pairs often causes insufficient training priors, generating overly smooth 3D scenes.<n>We propose a Text-to-LiDAR Diffusion Model for scene generation, named T2LDM, with a Self-Conditioned Representation Guidance (SCRG)
arXiv Detail & Related papers (2025-11-24T11:32:15Z) - LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving [8.465161411966761]
LiSTAR is a novel generative world model that operates directly on the sensor's native geometry.<n>LiSTAR captures complex dynamics from sparse temporal data.<n>Experiments validate LiSTAR's performance across 4D LiDAR reconstruction, prediction, and conditional generation.
arXiv Detail & Related papers (2025-11-20T05:11:22Z) - 3D and 4D World Modeling: A Survey [104.20852751473392]
World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit.<n>We introduce a structured taxonomy spanning video-based (VideoGen), occupancy-based (OccGen), and LiDAR-based (LiDARGen) approaches.<n>We discuss practical applications, identify open challenges, and highlight promising research directions.
arXiv Detail & Related papers (2025-09-04T17:59:58Z) - LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences [28.411253849111755]
LiDARCrafter is a unified framework for 4D LiDAR generation and editing.<n>It achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels.<n>The code and benchmark are released to the community.
arXiv Detail & Related papers (2025-08-05T17:59:56Z) - La La LiDAR: Large-Scale Layout Generation from LiDAR Data [45.5317990948996]
Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics.<n>We propose Large-scale Layout-guided LiDAR generation model ("La La LiDAR"), a novel layout-guided generative framework.<n>La La LiDAR achieves state-of-the-art performance in both LiDAR generation and downstream perception tasks.
arXiv Detail & Related papers (2025-08-05T17:59:55Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting [50.808933338389686]
We present LiDAR-GS, a real-time, high-fidelity re-simulation of LiDAR scans in public urban road scenes.<n>The method achieves state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets.
arXiv Detail & Related papers (2024-10-07T15:07:56Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem [12.26293873825084]
We propose to leverage pseudo-LiDAR point clouds generated from videos capturing a surround view of miniatures or real-world objects of minor classes.
Our method, called Pseudo Ground Truth Augmentation (PGT-Aug), consists of three main steps: (i) volumetric 3D instance reconstruction using a 2D-to-3D view synthesis model, (ii) object-level domain alignment with LiDAR intensity estimation, and (iii) a hybrid context-aware placement method from ground and map information.
arXiv Detail & Related papers (2024-03-18T08:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.