Cross-View Panorama Image Synthesis
- URL: http://arxiv.org/abs/2203.11832v1
- Date: Tue, 22 Mar 2022 15:59:44 GMT
- Title: Cross-View Panorama Image Synthesis
- Authors: Songsong Wu, Hao Tang, Xiao-Yuan Jing, Haifeng Zhao, Jianjun Qian,
Nicu Sebe, and Yan Yan
- Abstract summary: PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
- Score: 68.35351563852335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the problem of synthesizing a ground-view panorama
image conditioned on a top-view aerial image, which is a challenging problem
due to the large gap between the two image domains with different view-points.
Instead of learning cross-view mapping in a feedforward pass, we propose a
novel adversarial feedback GAN framework named PanoGAN with two key components:
an adversarial feedback module and a dual branch discrimination strategy.
First, the aerial image is fed into the generator to produce a target panorama
image and its associated segmentation map in favor of model training with
layout semantics. Second, the feature responses of the discriminator encoded by
our adversarial feedback module are fed back to the generator to refine the
intermediate representations, so that the generation performance is continually
improved through an iterative generation process. Third, to pursue
high-fidelity and semantic consistency of the generated panorama image, we
propose a pixel-segmentation alignment mechanism under the dual branch
discrimiantion strategy to facilitate cooperation between the generator and the
discriminator. Extensive experimental results on two challenging cross-view
image datasets show that PanoGAN enables high-quality panorama image generation
with more convincing details than state-of-the-art approaches. The source code
and trained models are available at \url{https://github.com/sswuai/PanoGAN}.
Related papers
- Learning Representations for Clustering via Partial Information
Discrimination and Cross-Level Interaction [5.101836008369192]
We present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction.
In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated.
arXiv Detail & Related papers (2024-01-24T14:51:33Z) - Learn From Orientation Prior for Radiograph Super-Resolution:
Orientation Operator Transformer [8.009052363001903]
High-resolution radiographic images play a pivotal role in the early diagnosis and treatment of skeletal muscle-related diseases.
It is promising to enhance image quality by introducing single-image super-resolution (SISR) model into the radiology image field.
The conventional image pipeline, which can learn a mixed mapping between SR and denoising from the color space and inter-pixel patterns, poses a particular challenge for radiographic images with limited pattern features.
arXiv Detail & Related papers (2023-12-27T07:56:24Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Generalizable Person Re-Identification via Viewpoint Alignment and
Fusion [74.30861504619851]
This work proposes to use a 3D dense pose estimation model and a texture mapping module to map pedestrian images to canonical view images.
Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images.
We show that our method can lead to superior performance over the existing approaches in various evaluation settings.
arXiv Detail & Related papers (2022-12-05T16:24:09Z) - Bridging the Visual Gap: Wide-Range Image Blending [16.464837892640812]
We introduce an effective deep-learning model to realize wide-range image blending.
We experimentally demonstrate that our proposed method is able to produce visually appealing results.
arXiv Detail & Related papers (2021-03-28T15:07:45Z) - DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image
Generation [8.26410341981427]
The Dual Attention Generative Adversarial Network (DTGAN) can synthesize high-quality and semantically consistent images.
The proposed model introduces channel-aware and pixel-aware attention modules that can guide the generator to focus on text-relevant channels and pixels.
A new type of visual loss is utilized to enhance the image resolution by ensuring vivid shape and perceptually uniform color distributions of generated images.
arXiv Detail & Related papers (2020-11-05T08:57:15Z) - Image-to-image Mapping with Many Domains by Sparse Attribute Transfer [71.28847881318013]
Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points.
Current convention is to approach this task with cycle-consistent GANs.
We propose an alternate approach that directly restricts the generator to performing a simple sparse transformation in a latent layer.
arXiv Detail & Related papers (2020-06-23T19:52:23Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.