Multi-Channel Attention Selection GANs for Guided Image-to-Image
Translation
- URL: http://arxiv.org/abs/2002.01048v2
- Date: Thu, 6 Oct 2022 05:36:32 GMT
- Title: Multi-Channel Attention Selection GANs for Guided Image-to-Image
Translation
- Authors: Hao Tang, Philip H.S. Torr, Nicu Sebe
- Abstract summary: We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation.
The proposed framework and modules are unified solutions and can be applied to solve other generation tasks such as semantic image synthesis.
- Score: 148.9985519929653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel model named Multi-Channel Attention Selection Generative
Adversarial Network (SelectionGAN) for guided image-to-image translation, where
we translate an input image into another while respecting an external semantic
guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance
information and consists of two stages. In the first stage, the input image and
the conditional semantic guidance are fed into a cycled semantic-guided
generation network to produce initial coarse results. In the second stage, we
refine the initial results by using the proposed multi-scale spatial pooling &
channel selection module and the multi-channel attention selection module.
Moreover, uncertainty maps automatically learned from attention maps are used
to guide the pixel loss for better network optimization. Exhaustive experiments
on four challenging guided image-to-image translation tasks (face, hand, body,
and street view) demonstrate that our SelectionGAN is able to generate
significantly better results than the state-of-the-art methods. Meanwhile, the
proposed framework and modules are unified solutions and can be applied to
solve other generation tasks such as semantic image synthesis. The code is
available at https://github.com/Ha0Tang/SelectionGAN.
Related papers
- SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic
Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task.
The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures.
The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss.
We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Hidden Path Selection Network for Semantic Segmentation of Remote
Sensing Images [38.89222641689085]
semantic segmentation in remote sensing images needs to portray diverse distributions over vast geographical locations.
Several algorithms have been designed to select pixel-wise adaptive forward paths for natural image analysis.
With the help of hidden variables derived from an extra mini-branch, HPS-Net is able to tackle the inherent problem about inaccessible global optimums.
arXiv Detail & Related papers (2021-12-09T21:31:13Z) - DiverGAN: An Efficient and Effective Single-Stage Framework for Diverse
Text-to-Image Generation [7.781425222538382]
DiverGAN is a framework to generate diverse, plausible and semantically consistent images according to a natural-language description.
DiverGAN adopts two novel word-level attention modules, i.e., a channel-attention module (CAM) and a pixel-attention module (PAM)
Conditional Adaptive Instance-Layer Normalization (CAdaILN) is introduced to enable the linguistic cues from the sentence embedding to flexibly manipulate the amount of change in shape and texture.
arXiv Detail & Related papers (2021-11-17T17:59:56Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation.
Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z) - DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image
Generation [8.26410341981427]
The Dual Attention Generative Adversarial Network (DTGAN) can synthesize high-quality and semantically consistent images.
The proposed model introduces channel-aware and pixel-aware attention modules that can guide the generator to focus on text-relevant channels and pixels.
A new type of visual loss is utilized to enhance the image resolution by ensuring vivid shape and perceptually uniform color distributions of generated images.
arXiv Detail & Related papers (2020-11-05T08:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.