SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm
- URL: http://arxiv.org/abs/2408.01812v2
- Date: Sat, 17 Aug 2024 08:05:02 GMT
- Title: SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm
- Authors: Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Jinhua Yu, Haote Yang, Conghui He,
- Abstract summary: We introduce SkyDiffusion, a novel cross-view generation method for synthesizing satellite images from street-view images.
We show that SkyDiffusion outperforms state-of-the-art methods on both suburban (CVUSA & CVACT) and urban cross-view datasets.
- Score: 12.818880200888504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Street-to-satellite image synthesis focuses on generating realistic satellite images from corresponding ground street-view images while maintaining a consistent content layout, similar to looking down from the sky. The significant differences in perspectives create a substantial domain gap between the views, making this cross-view generation task particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing satellite images from street-view images, leveraging diffusion models and Bird's Eye View (BEV) paradigm. First, we design a Curved-BEV method to transform street-view images to the satellite view, reformulating the challenging cross-domain image synthesis task into a conditional generation problem. Curved-BEV also includes a "Multi-to-One" mapping strategy for leveraging multiple street-view images within the same satellite coverage area, effectively solving the occlusion issues in dense urban scenes. Next, we design a BEV-controlled diffusion model to generate satellite images consistent with the street-view content, which also incorporates a light manipulation module to make the lighting conditions of the synthesized satellite images more flexible. Experimental results demonstrate that SkyDiffusion outperforms state-of-the-art methods on both suburban (CVUSA & CVACT) and urban (VIGOR-Chicago) cross-view datasets, with an average SSIM increase of 13.96% and a FID reduction of 20.54%, achieving realistic and content-consistent satellite image generation. The code and models of this work will be released at https://opendatalab.github.io/skydiffusion
Related papers
- From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model [16.716345249091408]
We explore Bird's-Eye View generation, converting a BEV map into its corresponding multi-view street images.
Our approach comprises two main components: the Neural View Transformation and the Street Image Generation.
arXiv Detail & Related papers (2024-09-02T07:47:16Z) - CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis [54.852701978617056]
CrossViewDiff is a cross-view diffusion model for satellite-to-street view synthesis.
To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules.
To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method.
arXiv Detail & Related papers (2024-08-27T03:41:44Z) - SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation [12.692812966686066]
We introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation.
Our method achieves an increase in mIOU by 10.13% and 5.21% compared with the state-of-the-art satellite-based and cross-view methods.
arXiv Detail & Related papers (2024-04-03T10:57:47Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - DiffusionSat: A Generative Foundation Model for Satellite Imagery [63.2807119794691]
We present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets.
Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting.
arXiv Detail & Related papers (2023-12-06T16:53:17Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Urban Radiance Fields [77.43604458481637]
We perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments.
Our approach extends Neural Radiance Fields, which has been demonstrated to synthesize realistic novel images for small scenes in controlled settings.
Each of these three extensions provides significant performance improvements in experiments on Street View data.
arXiv Detail & Related papers (2021-11-29T15:58:16Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.