Related papers: LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

URL: http://arxiv.org/abs/2508.03692v1
Date: Tue, 05 Aug 2025 17:59:56 GMT
Title: LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
Authors: Ao Liang, Youquan Liu, Yu Yang, Dongyue Lu, Linfeng Li, Lingdong Kong, Huaici Zhao, Wei Tsang Ooi,
Abstract summary: LiDARCrafter is a unified framework for 4D LiDAR generation and editing.<n>It achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels.<n>The code and benchmark are released to the community.
Score: 10.426609103049572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D LiDAR generation and editing. Given free-form natural language inputs, we parse instructions into ego-centric scene graphs, which condition a tri-branch diffusion network to generate object structures, motion trajectories, and geometry. These structured conditions enable diverse and fine-grained scene editing. Additionally, an autoregressive module generates temporally coherent 4D LiDAR sequences with smooth transitions. To support standardized evaluation, we establish a comprehensive benchmark with diverse metrics spanning scene-, object-, and sequence-level aspects. Experiments on the nuScenes dataset using this benchmark demonstrate that LiDARCrafter achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels, paving the way for data augmentation and simulation. The code and benchmark are released to the community.

Related papers

La La LiDAR: Large-Scale Layout Generation from LiDAR Data [45.5317990948996]
Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics.<n>We propose Large-scale Layout-guided LiDAR generation model ("La La LiDAR"), a novel layout-guided generative framework.<n>La La LiDAR achieves state-of-the-art performance in both LiDAR generation and downstream perception tasks.
arXiv Detail & Related papers (2025-08-05T17:59:55Z)
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency [32.16082566679126]
We present a unified framework for joint generation of driving videos and LiDAR sequences.<n>We employ a two-stage architecture that integrates a DiT-based video diffusion model with 3D-VAE modalities, and a BEV-aware LiDAR generator with NeRF-based rendering and adaptive sampling.<n>To guide the generation with structured semantics, we introduce DataCrafter, a captioning module built on vision-level models that provides scene-level and instance-language supervision.
arXiv Detail & Related papers (2025-06-09T07:20:49Z)
UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction [27.334884564978907]
We propose UnIRe, a 3D Splatting (3DGS) based approach that decomposes a scene into a static background and individual dynamic instances.<n>At its core, we introduce 4D superpoints, a novel representation that clusters multi-frame LiDAR points in 4D space.<n>Experiments show that our method outperforms existing methods in dynamic scene reconstruction while enabling accurate and flexible instance-level editing.
arXiv Detail & Related papers (2025-04-01T13:15:58Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving [74.06413946934002]
We introduce OLiDM, a novel framework capable of generating high-fidelity LiDAR data at both the object and the scene levels.<n>OLiDM consists of two pivotal components: the Object-Scene Progressive Generation (OPG) module and the Object Semantic Alignment (OSA) module.<n>OPG adapts to user-specific prompts to generate desired foreground objects, which are subsequently employed as conditions in scene generation.<n>OSA aims to rectify the misalignment between foreground objects and background scenes, enhancing the overall quality of the generated objects.
arXiv Detail & Related papers (2024-12-23T02:43:29Z)
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes [61.07023022220073]
We introduce DynamicCity, a novel 4D occupancy generation framework capable of generating large-scale, high-quality dynamic 4D scenes with semantics.<n>In particular, DynamicCity employs a novel Projection Module to effectively compress 4D features into six 2D feature maps for HexPlane construction.<n>We utilize an Expansion & Squeeze Strategy to reconstruct 3D feature volumes in parallel, which improves both network training efficiency and reconstruction accuracy.
arXiv Detail & Related papers (2024-10-23T17:59:58Z)
LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting [50.808933338389686]
We present LiDAR-GS, a real-time, high-fidelity re-simulation of LiDAR scans in public urban road scenes.<n>The method achieves state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets.
arXiv Detail & Related papers (2024-10-07T15:07:56Z)
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.<n>Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.<n>This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z)
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance. Our method surpasses existing methods in both quality and efficiency. We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z)
LidarDM: Generative LiDAR Simulation in a Generated World [21.343346521878864]
LidarDM is a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. We employ latent diffusion models to generate the 3D scene, combine it with dynamic actors to form the underlying 4D world, and subsequently produce realistic sensory observations within this virtual environment. Our experiments indicate that our approach outperforms competing algorithms in realism, temporal coherency, and layout consistency.
arXiv Detail & Related papers (2024-04-03T17:59:28Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks. We extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.
arXiv Detail & Related papers (2022-03-14T15:25:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.