Related papers: X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

URL: http://arxiv.org/abs/2506.13558v1
Date: Mon, 16 Jun 2025 14:43:18 GMT
Title: X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
Authors: Yu Yang, Alan Liang, Jianbiao Mei, Yukai Ma, Yong Liu, Gim Hee Lee,
Abstract summary: X-Scene is a novel framework for large-scale driving scene generation.<n>It achieves both geometric intricacy and appearance fidelity, while offering flexible controllability.<n>X-Scene significantly advances controllability and fidelity for large-scale driving scene generation.
Score: 49.4647778989539
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, the generation of large-scale 3D scenes that require spatial coherence remains underexplored. In this paper, we propose X-Scene, a novel framework for large-scale driving scene generation that achieves both geometric intricacy and appearance fidelity, while offering flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level conditions such as user-provided or text-driven layout for detailed scene composition and high-level semantic guidance such as user-intent and LLM-enriched text prompts for efficient customization. To enhance geometrical and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and the corresponding multiview images, while ensuring alignment between modalities. Additionally, we extend the generated local region into a large-scale scene through consistency-aware scene outpainting, which extrapolates new occupancy and images conditioned on the previously generated area, enhancing spatial continuity and preserving visual coherence. The resulting scenes are lifted into high-quality 3DGS representations, supporting diverse applications such as scene exploration. Comprehensive experiments demonstrate that X-Scene significantly advances controllability and fidelity for large-scale driving scene generation, empowering data generation and simulation for autonomous driving.

Related papers

RoomCraft: Controllable and Complete 3D Indoor Scene Generation [51.19602078504066]
RoomCraft is a multi-stage pipeline that converts real images, sketches, or text descriptions into coherent 3D indoor scenes.<n>Our approach combines a scene generation pipeline with a constraint-driven optimization framework.<n>RoomCraft significantly outperforms existing methods in generating realistic, semantically coherent, and visually appealing room layouts.
arXiv Detail & Related papers (2025-06-27T15:03:17Z)
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving [20.197094443215963]
We present DriveX, a self-supervised world model that learns general scene dynamics and holistic representations from driving videos.<n>DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation.<n>For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates features from DriveX's predictions to enhance task-specific inference.
arXiv Detail & Related papers (2025-05-25T17:27:59Z)
BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation [16.00575923179227]
3D scenes have highly complex structures and need to ensure that the output is dense, coherent, and contains all necessary structures.<n>Current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators.<n>We propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation.
arXiv Detail & Related papers (2025-01-15T11:33:34Z)
UniScene: Unified Occupancy-centric Driving Scene Generation [73.22859345600192]
We introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR.<n>UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps.<n>Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation.
arXiv Detail & Related papers (2024-12-06T21:41:52Z)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability.<n>Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z)
AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [17.600027937450342]
AutoSplat is a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes. Our method enables multi-view consistent simulation of challenging scenarios including lane changes.
arXiv Detail & Related papers (2024-07-02T18:36:50Z)
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)
SceneGen: Learning to Generate Realistic Traffic Scenes [92.98412203941912]
We present SceneGen, a neural autoregressive model of traffic scenes that eschews the need for rules and distributions. We demonstrate SceneGen's ability to faithfully model distributions of real traffic scenes.
arXiv Detail & Related papers (2021-01-16T22:51:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.