Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
- URL: http://arxiv.org/abs/2408.15242v1
- Date: Tue, 27 Aug 2024 17:59:55 GMT
- Title: Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
- Authors: Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao,
- Abstract summary: 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering.
The general fidelity of large-scale road scene renderings is often limited by the input imagery.
We introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images.
- Score: 10.37108303188536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.
Related papers
- AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis [57.249817395828174]
We propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes with real, ground-level crowd-sourced images.
The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images.
Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks.
arXiv Detail & Related papers (2025-04-17T17:57:05Z) - Extrapolated Urban View Synthesis Benchmark [53.657271730352214]
Photo simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs)
At their core is Novel View Synthesis (NVS), a capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs.
Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes.
We will release the data to help advance self-driving and urban robotics simulation technology.
arXiv Detail & Related papers (2024-12-06T18:41:39Z) - Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes [55.15494682493422]
We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, to tackle the unified reconstruction and rendering for aerial and street views.
Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes.
arXiv Detail & Related papers (2024-12-02T17:42:00Z) - LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes [73.65115834242866]
Photorealistic simulation plays a crucial role in applications such as autonomous driving.
However, reconstruction quality suffers on street scenes due to collinear camera motions and sparser samplings at higher speeds.
We propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes.
arXiv Detail & Related papers (2024-05-01T23:07:12Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation [10.898724668444125]
We present a learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time.
We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain.
arXiv Detail & Related papers (2024-01-30T22:37:24Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.