SkyDiffusion: Ground-to-Aerial Image Synthesis with Diffusion Models and BEV Paradigm
- URL: http://arxiv.org/abs/2408.01812v3
- Date: Thu, 19 Dec 2024 11:29:09 GMT
- Title: SkyDiffusion: Ground-to-Aerial Image Synthesis with Diffusion Models and BEV Paradigm
- Authors: Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, Conghui He,
- Abstract summary: Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images.
We introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images.
We introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications.
- Score: 14.492759165786364
- License:
- Abstract: Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images while maintaining consistent content layout, simulating a top-down view. The significant viewpoint difference leads to domain gaps between views, and dense urban scenes limit the visible range of street views, making this cross-view generation task particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images, utilizing a diffusion model and the Bird's-Eye View (BEV) paradigm. The Curved-BEV method in SkyDiffusion converts street-view images into a BEV perspective, effectively bridging the domain gap, and employs a "multi-to-one" mapping strategy to address occlusion issues in dense urban scenes. Next, SkyDiffusion designed a BEV-guided diffusion model to generate content-consistent and realistic aerial images. Additionally, we introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications, including disaster scene aerial synthesis, historical high-resolution satellite image synthesis, and low-altitude UAV image synthesis tasks. Experimental results demonstrate that SkyDiffusion outperforms state-of-the-art methods on cross-view datasets across natural (CVUSA), suburban (CVACT), urban (VIGOR-Chicago), and various application scenarios (G2A-3), achieving realistic and content-consistent aerial image generation. More result and dataset information can be found at https://opendatalab.github.io/skydiffusion/ .
Related papers
- Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes [55.15494682493422]
We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, to tackle the unified reconstruction and rendering for aerial and street views.
Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes.
arXiv Detail & Related papers (2024-12-02T17:42:00Z) - From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model [16.716345249091408]
We explore Bird's-Eye View generation, converting a BEV map into its corresponding multi-view street images.
Our approach comprises two main components: the Neural View Transformation and the Street Image Generation.
arXiv Detail & Related papers (2024-09-02T07:47:16Z) - CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis [54.852701978617056]
CrossViewDiff is a cross-view diffusion model for satellite-to-street view synthesis.
To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules.
To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method.
arXiv Detail & Related papers (2024-08-27T03:41:44Z) - Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance [12.723045383279995]
We present a novel Geometric Preserving Ground-to-Aerial (G2A) model that can generate realistic aerial images from ground images.
To train our model, we present a new multi-modal cross-view dataset, namely VIGORv2.
We also present two applications, data augmentation for cross-view geo-localization and sketch-based region search.
arXiv Detail & Related papers (2024-08-08T05:17:27Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a
Single Image using Diffusion Models [72.76182801289497]
We present a novel method, Aerial Diffusion, for generating aerial views from a single ground-view image using text guidance.
We address two main challenges corresponding to domain gap between the ground-view and the aerial view.
Aerial Diffusion is the first approach that performs ground-to-aerial translation in an unsupervised manner.
arXiv Detail & Related papers (2023-03-15T22:26:09Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Urban Radiance Fields [77.43604458481637]
We perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments.
Our approach extends Neural Radiance Fields, which has been demonstrated to synthesize realistic novel images for small scenes in controlled settings.
Each of these three extensions provides significant performance improvements in experiments on Street View data.
arXiv Detail & Related papers (2021-11-29T15:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.