AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
- URL: http://arxiv.org/abs/2504.13157v1
- Date: Thu, 17 Apr 2025 17:57:05 GMT
- Title: AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
- Authors: Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani,
- Abstract summary: We propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes with real, ground-level crowd-sourced images.<n>The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images.<n>Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks.
- Score: 57.249817395828174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore the task of geometric reconstruction of images captured from a mixture of ground and aerial views. Current state-of-the-art learning-based approaches fail to handle the extreme viewpoint variation between aerial-ground image pairs. Our hypothesis is that the lack of high-quality, co-registered aerial-ground datasets for training is a key reason for this failure. Such data is difficult to assemble precisely because it is difficult to reconstruct in a scalable way. To overcome this challenge, we propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes (e.g., Google Earth) with real, ground-level crowd-sourced images (e.g., MegaDepth). The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images where mesh-based renderings lack sufficient detail, effectively bridging the domain gap between real images and pseudo-synthetic renderings. Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks. For example, we observe that baseline DUSt3R localizes fewer than 5% of aerial-ground pairs within 5 degrees of camera rotation error, while fine-tuning with our data raises accuracy to nearly 56%, addressing a major failure point in handling large viewpoint changes. Beyond camera estimation and scene reconstruction, our dataset also improves performance on downstream tasks like novel-view synthesis in challenging aerial-ground scenarios, demonstrating the practical value of our approach in real-world applications.
Related papers
- Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes [55.15494682493422]
We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, to tackle the unified reconstruction and rendering for aerial and street views.
Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes.
arXiv Detail & Related papers (2024-12-02T17:42:00Z) - Skyeyes: Ground Roaming using Aerial View Images [9.159470619808127]
We introduce Skyeyes, a novel framework that can generate sequences of ground view images using only aerial view inputs.
More specifically, we combine a 3D representation with a view consistent generation model, which ensures coherence between generated images.
The images maintain improved spatial-temporal coherence and realism, enhancing scene comprehension and visualization from aerial perspectives.
arXiv Detail & Related papers (2024-09-25T07:21:43Z) - Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis [14.492759165786364]
Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images.<n>We introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images.<n>We introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications.
arXiv Detail & Related papers (2024-08-03T15:43:56Z) - MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References [49.71130133080821]
MaRINeR is a refinement method that leverages information of a nearby mapping image to improve the rendering of a target viewpoint.
We show improved renderings in quantitative metrics and qualitative examples from both explicit and implicit scene representations.
arXiv Detail & Related papers (2024-07-18T17:50:03Z) - SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization [16.460851701725392]
We present a novel approach that optimize radiance fields with scene graphs to mitigate the influence of outlier poses.
Our method incorporates an adaptive inlier-outlier confidence estimation scheme based on scene graphs.
We also introduce an effective intersection-over-union (IoU) loss to optimize the camera pose and surface geometry.
arXiv Detail & Related papers (2024-07-17T15:50:17Z) - Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery [51.73680703579997]
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images.
objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads.
We introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes.
We then introduce a novel cross-view instance label grouping strategy to mitigate the multi-view inconsistency problem in the 2D instance labels.
arXiv Detail & Related papers (2024-03-18T14:15:39Z) - Exposure Bracketing Is All You Need For A High-Quality Image [50.822601495422916]
Multi-exposure images are complementary in denoising, deblurring, high dynamic range imaging, and super-resolution.<n>We propose to utilize exposure bracketing photography to get a high-quality image by combining these tasks in this work.<n>In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z) - Render-and-Compare: Cross-View 6 DoF Localization from Noisy Prior [17.08552155321949]
In this work, we propose to go beyond the traditional ground-level setting and exploit the cross-view localization from aerial to ground.
As no public dataset exists for the studied problem, we collect a new dataset that provides a variety of cross-view images from smartphones and drones.
We develop a semi-automatic system to acquire ground-truth poses for query images.
arXiv Detail & Related papers (2023-02-13T11:43:47Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Ground material classification and for UAV-based photogrammetric 3D data
A 2D-3D Hybrid Approach [1.3359609092684614]
In recent years, photogrammetry has been widely used in many areas to create 3D virtual data representing the physical environment.
These cutting-edge technologies have caught the US Army and Navy's attention for the purpose of rapid 3D battlefield reconstruction, virtual training, and simulations.
arXiv Detail & Related papers (2021-09-24T22:29:26Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.