From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
- URL: http://arxiv.org/abs/2509.24369v1
- Date: Mon, 29 Sep 2025 07:14:49 GMT
- Title: From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis
- Authors: Khawlah Bajbaa, Abbas Anwar, Muhammad Saqib, Hafeez Anwar, Nabin Sharma, Muhammad Usman,
- Abstract summary: This paper presents a hybrid framework that integrates diffusion-based models and conditional generative adversarial networks to generate street-view images from satellite imagery.<n>The framework successfully generates realistic and geometrically consistent street-view images while preserving fine-grained local details.
- Score: 3.2018357347625037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Street view imagery has become an essential source for geospatial data collection and urban analytics, enabling the extraction of valuable insights that support informed decision-making. However, synthesizing street-view images from corresponding satellite imagery presents significant challenges due to substantial differences in appearance and viewing perspective between these two domains. This paper presents a hybrid framework that integrates diffusion-based models and conditional generative adversarial networks to generate geographically consistent street-view images from satellite imagery. Our approach uses a multi-stage training strategy that incorporates Stable Diffusion as the core component within a dual-branch architecture. To enhance the framework's capabilities, we integrate a conditional Generative Adversarial Network (GAN) that enables the generation of geographically consistent panoramic street views. Furthermore, we implement a fusion strategy that leverages the strengths of both models to create robust representations, thereby improving the geometric consistency and visual quality of the generated street-view images. The proposed framework is evaluated on the challenging Cross-View USA (CVUSA) dataset, a standard benchmark for cross-view image synthesis. Experimental results demonstrate that our hybrid approach outperforms diffusion-only methods across multiple evaluation metrics and achieves competitive performance compared to state-of-the-art GAN-based methods. The framework successfully generates realistic and geometrically consistent street-view images while preserving fine-grained local details, including street markings, secondary roads, and atmospheric elements such as clouds.
Related papers
- JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization [26.250213248316342]
Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation.<n>Existing methods predominantly rely on semantic features from RGB images.<n>We introduce a Joint perception network to integrate RGB and Normal images.
arXiv Detail & Related papers (2025-09-06T12:11:51Z) - Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis [48.945931374180795]
This paper presents a novel approach for cross-view synthesis aimed at generating plausible ground-level images from corresponding satellite imagery or vice versa.<n>We refer to these tasks as satellite-to-ground (Sat2Grd) and ground-to-satellite (Grd2Sat) synthesis, respectively.
arXiv Detail & Related papers (2024-12-04T13:47:51Z) - CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis [54.852701978617056]
CrossViewDiff is a cross-view diffusion model for satellite-to-street view synthesis.
To address the challenges posed by the large discrepancy across views, we design the satellite scene structure estimation and cross-view texture mapping modules.
To achieve a more comprehensive evaluation of the synthesis results, we additionally design a GPT-based scoring method.
arXiv Detail & Related papers (2024-08-27T03:41:44Z) - Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis [39.43518544801439]
Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images.<n>We introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images.<n>We introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications.
arXiv Detail & Related papers (2024-08-03T15:43:56Z) - Rethinking Image Skip Connections in StyleGAN2 [5.929956715430167]
StyleGAN models have gained significant traction in the field of image synthesis.
The adoption of image skip connection is favored over the traditional residual connection.
We introduce the image squeeze connection, which significantly improves the quality of image synthesis.
arXiv Detail & Related papers (2024-07-08T00:21:17Z) - Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z) - Dual-Camera Super-Resolution with Aligned Attention Modules [56.54073689003269]
We present a novel approach to reference-based super-resolution (RefSR) with the focus on dual-camera super-resolution (DCSR)
Our proposed method generalizes the standard patch-based feature matching with spatial alignment operations.
To bridge the domain gaps between real-world images and the training images, we propose a self-supervised domain adaptation strategy.
arXiv Detail & Related papers (2021-09-03T07:17:31Z) - Coming Down to Earth: Satellite-to-Street View Synthesis for
Geo-Localization [9.333087475006003]
Cross-view image based geo-localization is notoriously challenging due to drastic viewpoint and appearance differences between the two domains.
We show that we can address this discrepancy explicitly by learning to synthesize realistic street views from satellite inputs.
We propose a novel multi-task architecture in which image synthesis and retrieval are considered jointly.
arXiv Detail & Related papers (2021-03-11T17:40:59Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.