CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
- URL: http://arxiv.org/abs/2408.14765v1
- Date: Tue, 27 Aug 2024 03:41:44 GMT
- Title: CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis
- Authors: Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He,
- Abstract summary: CrossViewDiff is a cross-view diffusion model for satellite-to-street view synthesis.
To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules.
To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method.
- Score: 54.852701978617056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis task. In this work, we propose CrossViewDiff, a cross-view diffusion model for satellite-to-street view synthesis. To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules to construct the structural and textural controls for street-view image synthesis. We further design a cross-view control guided denoising process that incorporates the above controls via an enhanced cross-view attention module. To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method as a supplement to standard evaluation metrics. We also explore the effect of different data sources (e.g., text, maps, building heights, and multi-temporal satellite imagery) on this task. Results on three public cross-view datasets show that CrossViewDiff outperforms current state-of-the-art on both standard and GPT-based evaluation metrics, generating high-quality street-view panoramas with more realistic structures and textures across rural, suburban, and urban scenes. The code and models of this work will be released at https://opendatalab.github.io/CrossViewDiff/.
Related papers
- SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm [12.818880200888504]
We introduce SkyDiffusion, a novel cross-view generation method for synthesizing satellite images from street-view images.
We show that SkyDiffusion outperforms state-of-the-art methods on both suburban (CVUSA & CVACT) and urban cross-view datasets.
arXiv Detail & Related papers (2024-08-03T15:43:56Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z) - A Graph-Matching Approach for Cross-view Registration of Over-view 2 and
Street-view based Point Clouds [4.742825811314168]
We propose a fully automated geo-registration method for cross-view data, which utilizes semantically segmented object boundaries as view-invariant features.
The proposed method models segments of buildings as nodes of graphs, both detected from the satellite-based and street-view based point clouds.
The matched nodes will be subject to a further optimization to allow precise-registration, followed by a constrained bundle adjustment on the street-view image to keep 2D29 3D consistencies.
arXiv Detail & Related papers (2022-02-14T16:43:28Z) - Coming Down to Earth: Satellite-to-Street View Synthesis for
Geo-Localization [9.333087475006003]
Cross-view image based geo-localization is notoriously challenging due to drastic viewpoint and appearance differences between the two domains.
We show that we can address this discrepancy explicitly by learning to synthesize realistic street views from satellite inputs.
We propose a novel multi-task architecture in which image synthesis and retrieval are considered jointly.
arXiv Detail & Related papers (2021-03-11T17:40:59Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z) - Street-view Panoramic Video Synthesis from a Single Satellite Image [92.26826861266784]
We present a novel method for synthesizing both temporally and geometrically consistent street-view panoramic video.
Existing cross-view synthesis approaches focus more on images, while video synthesis in such a case has not yet received enough attention.
arXiv Detail & Related papers (2020-12-11T20:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.