Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance
- URL: http://arxiv.org/abs/2408.04224v2
- Date: Tue, 20 Aug 2024 19:13:08 GMT
- Title: Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance
- Authors: Ahmad Arrabi, Xiaohan Zhang, Waqas Sultani, Chen Chen, Safwan Wshah,
- Abstract summary: We present a novel Geometric Preserving Ground-to-Aerial (G2A) model that can generate realistic aerial images from ground images.
To train our model, we present a new multi-modal cross-view dataset, namely VIGORv2.
We also present two applications, data augmentation for cross-view geo-localization and sketch-based region search.
- Score: 12.723045383279995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aerial imagery analysis is critical for many research fields. However, obtaining frequent high-quality aerial images is not always accessible due to its high effort and cost requirements. One solution is to use the Ground-to-Aerial (G2A) technique to synthesize aerial images from easily collectible ground images. However, G2A is rarely studied, because of its challenges, including but not limited to, the drastic view changes, occlusion, and range of visibility. In this paper, we present a novel Geometric Preserving Ground-to-Aerial (G2A) image synthesis (GPG2A) model that can generate realistic aerial images from ground images. GPG2A consists of two stages. The first stage predicts the Bird's Eye View (BEV) segmentation (referred to as the BEV layout map) from the ground image. The second stage synthesizes the aerial image from the predicted BEV layout map and text descriptions of the ground image. To train our model, we present a new multi-modal cross-view dataset, namely VIGORv2 which is built upon VIGOR with newly collected aerial images, maps, and text descriptions. Our extensive experiments illustrate that GPG2A synthesizes better geometry-preserved aerial images than existing models. We also present two applications, data augmentation for cross-view geo-localization and sketch-based region search, to further verify the effectiveness of our GPG2A. The code and data will be publicly available.
Related papers
- BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization [11.50186721264038]
This paper addresses the problem of weakly supervised cross-view localization.
The goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations.
We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives.
arXiv Detail & Related papers (2025-02-13T08:54:04Z) - SkyDiffusion: Ground-to-Aerial Image Synthesis with Diffusion Models and BEV Paradigm [14.492759165786364]
Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images.
We introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images.
We introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications.
arXiv Detail & Related papers (2024-08-03T15:43:56Z) - Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth [56.565405280314884]
This paper focuses on improving the performance of a trained model in a new target area by leveraging only the target-area images without fine GT.
We propose a weakly supervised learning approach based on knowledge self-distillation.
Our approach is validated using two recent state-of-the-art models on two benchmarks.
arXiv Detail & Related papers (2024-06-01T15:58:35Z) - Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery [51.73680703579997]
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images.
objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads.
We introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes.
We then introduce a novel cross-view instance label grouping strategy to mitigate the multi-view inconsistency problem in the 2D instance labels.
arXiv Detail & Related papers (2024-03-18T14:15:39Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - Ground-to-Aerial Person Search: Benchmark Dataset and Approach [42.54151390290665]
We construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS.
G2APS contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras.
arXiv Detail & Related papers (2023-08-24T11:11:26Z) - GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data
Generation [91.01581867841894]
We propose the GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts.
Our GeoDiffusion is able to encode not only the bounding boxes but also extra geometric conditions such as camera views in self-driving scenes.
arXiv Detail & Related papers (2023-06-07T17:17:58Z) - Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a
Single Image using Diffusion Models [72.76182801289497]
We present a novel method, Aerial Diffusion, for generating aerial views from a single ground-view image using text guidance.
We address two main challenges corresponding to domain gap between the ground-view and the aerial view.
Aerial Diffusion is the first approach that performs ground-to-aerial translation in an unsupervised manner.
arXiv Detail & Related papers (2023-03-15T22:26:09Z) - Real-time Geo-localization Using Satellite Imagery and Topography for
Unmanned Aerial Vehicles [18.71806336611299]
We propose a framework that is reliable in changing scenes and pragmatic for lightweight embedded systems on UAVs.
The framework is comprised of two stages: offline database preparation and online inference.
We present field experiments of image-based localization on two different UAV platforms to validate our results.
arXiv Detail & Related papers (2021-08-07T01:47:19Z) - AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification [2.931113769364182]
We present two new publicly available datasets named thedatasetand CV-BrCT.
The first one contains triplets of images from the same geographic coordinate with different perspectives of view extracted from various places around the world.
The second dataset contains pairs of aerial and street-level images extracted from southeast Brazil.
arXiv Detail & Related papers (2020-08-03T18:55:46Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.