Related papers: I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation

I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation

URL: http://arxiv.org/abs/2507.23683v1
Date: Thu, 31 Jul 2025 15:59:16 GMT
Title: I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation
Authors: Jialei Chen, Wuhao Xu, Sipeng He, Baoru Huang, Dongchun Ren,
Abstract summary: We introduce a novel method, I2V-GS, to transfer the Infrastructure view To the Vehicle view with Gaussian Splatting.<n>We also introduce RoadSight, a multi-modality, multi-view dataset from real scenarios in infrastructure views.<n>I2V-GS significantly improves quality under vehicle view, outperforming StreetGaussian in NTA-Iou, NTL-Iou, and FID by 45.7%, 34.2%, and 14.9%, respectively.
Score: 4.041586891110227
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vast and high-quality data are essential for end-to-end autonomous driving systems. However, current driving data is mainly collected by vehicles, which is expensive and inefficient. A potential solution lies in synthesizing data from real-world images. Recent advancements in 3D reconstruction demonstrate photorealistic novel view synthesis, highlighting the potential of generating driving data from images captured on the road. This paper introduces a novel method, I2V-GS, to transfer the Infrastructure view To the Vehicle view with Gaussian Splatting. Reconstruction from sparse infrastructure viewpoints and rendering under large view transformations is a challenging problem. We adopt the adaptive depth warp to generate dense training views. To further expand the range of views, we employ a cascade strategy to inpaint warped images, which also ensures inpainting content is consistent across views. To further ensure the reliability of the diffusion model, we utilize the cross-view information to perform a confidenceguided optimization. Moreover, we introduce RoadSight, a multi-modality, multi-view dataset from real scenarios in infrastructure views. To our knowledge, I2V-GS is the first framework to generate autonomous driving datasets with infrastructure-vehicle view transformation. Experimental results demonstrate that I2V-GS significantly improves synthesis quality under vehicle view, outperforming StreetGaussian in NTA-Iou, NTL-Iou, and FID by 45.7%, 34.2%, and 14.9%, respectively.

Related papers

BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images [21.811586185200706]
This paper addresses the challenge of reconstructing vehicles from sparse-view inputs.<n>We leverage depth maps and a robust pose estimation architecture to synthesize novel views.<n>We present a novel dataset featuring both synthetic and real-world public transportation vehicles.
arXiv Detail & Related papers (2025-07-16T10:04:35Z)
MTGS: Multi-Traversal Gaussian Splatting [51.22657444433942]
Multi-traversal data provides multiple viewpoints for scene reconstruction within a road block.<n>We propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data.<n>Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines.
arXiv Detail & Related papers (2025-03-16T15:46:12Z)
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets.<n>Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples.<n>Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z)
V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models [13.716889927164383]
Vehicle-to-everything (V2X) cooperation has emerged as a promising paradigm to overcome the perception limitations of classical autonomous driving.<n>This paper introduces V2X-VLM, a novel end-to-end (E2E) cooperative autonomous driving framework based on vision-language models (VLMs)<n>V2X-VLM integrates multiperspective camera views from vehicles and infrastructure with text-based scene descriptions to enable a more comprehensive understanding of driving environments.
arXiv Detail & Related papers (2024-08-17T16:42:13Z)
Learning Lane Graphs from Aerial Imagery Using Transformers [7.718401895021425]
This work introduces a novel approach to generating successor lane graphs from aerial imagery. We frame successor lane graphs as a collection of maximal length paths and predict them using a Detection Transformer (DETR) architecture. We demonstrate the efficacy of our method through extensive experiments on the diverse and large-scale UrbanLaneGraph dataset.
arXiv Detail & Related papers (2024-07-08T07:42:32Z)
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view.<n>To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Deep Perspective Transformation Based Vehicle Localization on Bird's Eye View [0.49747156441456597]
Traditional approaches rely on installing multiple sensors to simulate the environment. We propose an alternative solution by generating a top-down representation of the scene. We present an architecture that transforms perspective view RGB images into bird's-eye-view maps with segmented surrounding vehicles.
arXiv Detail & Related papers (2023-11-12T10:16:42Z)
Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years. Data-driven simulation for autonomous driving has been a focal point of recent research. We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z)
Efficient Self-supervised Vision Transformers for Representation Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity. We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies. Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.