A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation
- URL: http://arxiv.org/abs/2511.19004v1
- Date: Mon, 24 Nov 2025 11:32:15 GMT
- Title: A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation
- Authors: Wentao Qu, Guofeng Mei, Yang Wu, Yongshun Gong, Xiaoshui Huang, Liang Xiao,
- Abstract summary: Text-to-LiDAR generation can customize 3D data with rich structures and diverse scenes for downstream tasks.<n>However, the scarcity of Text-LiDAR pairs often causes insufficient training priors, generating overly smooth 3D scenes.<n>We propose a Text-to-LiDAR Diffusion Model for scene generation, named T2LDM, with a Self-Conditioned Representation Guidance (SCRG)
- Score: 41.43267776407459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-LiDAR generation can customize 3D data with rich structures and diverse scenes for downstream tasks. However, the scarcity of Text-LiDAR pairs often causes insufficient training priors, generating overly smooth 3D scenes. Moreover, low-quality text descriptions may degrade generation quality and controllability. In this paper, we propose a Text-to-LiDAR Diffusion Model for scene generation, named T2LDM, with a Self-Conditioned Representation Guidance (SCRG). Specifically, SCRG, by aligning to the real representations, provides the soft supervision with reconstruction details for the Denoising Network (DN) in training, while decoupled in inference. In this way, T2LDM can perceive rich geometric structures from data distribution, generating detailed objects in scenes. Meanwhile, we construct a content-composable Text-LiDAR benchmark, T2nuScenes, along with a controllability metric. Based on this, we analyze the effects of different text prompts for LiDAR generation quality and controllability, providing practical prompt paradigms and insights. Furthermore, a directional position prior is designed to mitigate street distortion, further improving scene fidelity. Additionally, by learning a conditional encoder via frozen DN, T2LDM can support multiple conditional tasks, including Sparse-to-Dense, Dense-to-Sparse, and Semantic-to-LiDAR generation. Extensive experiments in unconditional and conditional generation demonstrate that T2LDM outperforms existing methods, achieving state-of-the-art scene generation.
Related papers
- Learning to Generate 4D LiDAR Sequences [28.411253849111755]
We present LiDARCrafter, a unified framework that converts free-form language into editable LiDAR sequences.<n>LiDARCrafter achieves state-of-the-art fidelity, controllability, and temporal consistency, offering a foundation for LiDAR-based simulation and data augmentation.
arXiv Detail & Related papers (2025-09-15T14:14:48Z) - LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences [28.411253849111755]
LiDARCrafter is a unified framework for 4D LiDAR generation and editing.<n>It achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels.<n>The code and benchmark are released to the community.
arXiv Detail & Related papers (2025-08-05T17:59:56Z) - La La LiDAR: Large-Scale Layout Generation from LiDAR Data [45.5317990948996]
Controllable generation of realistic LiDAR scenes is crucial for applications such as autonomous driving and robotics.<n>We propose Large-scale Layout-guided LiDAR generation model ("La La LiDAR"), a novel layout-guided generative framework.<n>La La LiDAR achieves state-of-the-art performance in both LiDAR generation and downstream perception tasks.
arXiv Detail & Related papers (2025-08-05T17:59:55Z) - Layout2Scene: 3D Semantic Layout Guided Scene Generation via Geometry and Appearance Diffusion Priors [52.63385546943866]
We present a text-to-scene generation method (namely, Layout2Scene) using additional semantic layout as the prompt to inject precise control of 3D object positions.<n>To fully leverage 2D diffusion priors in geometry and appearance generation, we introduce a semantic-guided geometry diffusion model and a semantic-geometry guided diffusion model.<n>Our method can generate more plausible and realistic scenes as compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-05T12:20:13Z) - OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving [74.06413946934002]
We introduce OLiDM, a novel framework capable of generating high-fidelity LiDAR data at both the object and the scene levels.<n>OLiDM consists of two pivotal components: the Object-Scene Progressive Generation (OPG) module and the Object Semantic Alignment (OSA) module.<n>OPG adapts to user-specific prompts to generate desired foreground objects, which are subsequently employed as conditions in scene generation.<n>OSA aims to rectify the misalignment between foreground objects and background scenes, enhancing the overall quality of the generated objects.
arXiv Detail & Related papers (2024-12-23T02:43:29Z) - GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space [64.82017974849697]
We train a feed-forward text-to-3D diffusion generator for human characters using only single-view 2D data for supervision.<n>GANFusion starts by generating unconditional triplane features for 3D data using a GAN architecture trained with only single-view 2D data.
arXiv Detail & Related papers (2024-12-21T17:59:17Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting [53.58528891081709]
We present LiDAR-GS, a real-time, high-fidelity re-simulation of LiDAR scans in public urban road scenes.<n>The method achieves state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets.
arXiv Detail & Related papers (2024-10-07T15:07:56Z) - Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer [38.18396501696647]
We propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generation model.
We design an equirectangular transformer architecture, utilizing the designed equirectangular attention to capture LiDAR features.
We construct nuLiDARtext which offers diverse text descriptors for 34,149 LiDAR point clouds from 850 scenes.
arXiv Detail & Related papers (2024-07-29T01:18:47Z) - Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT [120.39362661689333]
We present an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency.
Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities.
arXiv Detail & Related papers (2024-06-05T17:53:26Z) - Towards Realistic Scene Generation with LiDAR Diffusion Models [15.487070964070165]
Diffusion models (DMs) excel in photo-realistic image synthesis, but their adaptation to LiDAR scene generation poses a substantial hurdle.
We propose LiDAR Diffusion Models (LiDMs) to generate LiDAR-realistic scenes from a latent space tailored to capture the realism of LiDAR scenes.
Specifically, we introduce curve-wise compression to simulate real-world LiDAR patterns, point-wise coordinate supervision to learn scene geometry, and patch-wise encoding for a full 3D object context.
arXiv Detail & Related papers (2024-03-31T22:18:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.