GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio
- URL: http://arxiv.org/abs/2602.20673v1
- Date: Tue, 24 Feb 2026 08:22:42 GMT
- Title: GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generatio
- Authors: Hao Zhang, Lue Fan, Qitai Wang, Wenbo Li, Zehuan Wu, Lewei Lu, Zhaoxiang Zhang, Hongsheng Li,
- Abstract summary: We present GA-Drive, a novel simulation framework capable of generating camera views along user-specified novel trajectories.<n>GA-Drive synthesizes novel pseudo-views using geometry information.<n>These pseudo-views are then transformed into photorealistic views using a trained video diffusion model.
- Score: 62.07995406671134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A free-viewpoint, editable, and high-fidelity driving simulator is crucial for training and evaluating end-to-end autonomous driving systems. In this paper, we present GA-Drive, a novel simulation framework capable of generating camera views along user-specified novel trajectories through Geometry-Appearance Decoupling and Diffusion-Based Generation. Given a set of images captured along a recorded trajectory and the corresponding scene geometry, GA-Drive synthesizes novel pseudo-views using geometry information. These pseudo-views are then transformed into photorealistic views using a trained video diffusion model. In this way, we decouple the geometry and appearance of scenes. An advantage of such decoupling is its support for appearance editing via state-of-the-art video-to-video editing techniques, while preserving the underlying geometry, enabling consistent edits across both original and novel trajectories. Extensive experiments demonstrate that GA-Drive substantially outperforms existing methods in terms of NTA-IoU, NTL-IoU, and FID scores.
Related papers
- HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles [63.88996084630768]
Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation.<n>We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes.<n>Experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations.
arXiv Detail & Related papers (2026-02-24T20:03:47Z) - Visual Implicit Geometry Transformer for Autonomous Driving [7.795200422563638]
We introduce the Visual Implicit Geometry Transformer (ViGT), an autonomous driving geometric model.<n>ViGT estimates a continuous 3D occupancy field in a birds-eye-view (BEV) addressing domain-specific requirements.<n>We validate the scalability and generalizability of our approach by training our model on a mixture of five large-scale autonomous driving datasets.
arXiv Detail & Related papers (2026-02-05T11:54:38Z) - DVGT: Driving Visual Geometry Transformer [63.38483879291505]
A driving-targeted dense geometry perception model can adapt to different scenarios and camera configurations.<n>We propose a Driving Visual Geometry Transformer (DVGT), which reconstructs a global dense 3D point map from a sequence of unposed multi-view visual inputs.<n>DVGT is free of explicit 3D geometric priors, enabling flexible processing of arbitrary camera configurations.
arXiv Detail & Related papers (2025-12-18T18:59:57Z) - Vid-CamEdit: Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry [41.904066758259624]
We introduce Vid-CamEdit, a novel framework for video camera trajectory editing.<n>Our approach consists of two steps: estimating temporally consistent geometry, and generative rendering guided by this geometry.
arXiv Detail & Related papers (2025-06-16T17:02:47Z) - Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention [61.3281618482513]
We present CogDriving, a novel network designed for synthesizing high-quality multi-view driving videos.<n>CogDriving leverages a Diffusion Transformer architecture with holistic-4D attention modules, enabling simultaneous associations across the dimensions.<n>CogDriving demonstrates strong performance on the nuScenes validation set, achieving an FVD score of 37.8, highlighting its ability to generate realistic driving videos.
arXiv Detail & Related papers (2024-12-04T18:02:49Z) - DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation [10.296670127024045]
DriveScape is an end-to-end framework for multi-view, 3D condition-guided video generation.
Our Bi-Directional Modulated Transformer (BiMot) ensures precise alignment of 3D structural information.
DriveScape excels in video generation performance, achieving state-of-the-art results on the nuScenes dataset with an FID score of 8.34 and an FVD score of 76.39.
arXiv Detail & Related papers (2024-09-09T09:43:17Z) - View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV [43.37259596065606]
We address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios.
Changes in the scene background not only render traditional frame-to-frame object IOU association methods ineffective but also introduce significant view shifts in the objects.
We propose a novel universal HomView-MOT framework, which for the first time harnesses the view Homography inherent in changing scenes to solve MOT challenges.
arXiv Detail & Related papers (2024-03-16T06:48:33Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.