ITTR: Unpaired Image-to-Image Translation with Transformers
- URL: http://arxiv.org/abs/2203.16015v1
- Date: Wed, 30 Mar 2022 02:46:12 GMT
- Title: ITTR: Unpaired Image-to-Image Translation with Transformers
- Authors: Wanfeng Zheng, Qiang Li, Guoxin Zhang, Pengfei Wan, Zhongyuan Wang
- Abstract summary: We propose an effective and efficient architecture for unpaired Image-to-Image Translation with Transformers (ITTR)
ITTR has two main designs: 1) hybrid perception block (HPB) for token mixing from different fields receptive to utilize global semantics; 2) dual pruned self-attention (DPSA) to sharply reduce the computational complexity.
Our ITTR outperforms the state-of-the-arts for unpaired image-to-image translation on six benchmark datasets.
- Score: 34.118637795470875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unpaired image-to-image translation is to translate an image from a source
domain to a target domain without paired training data. By utilizing CNN in
extracting local semantics, various techniques have been developed to improve
the translation performance. However, CNN-based generators lack the ability to
capture long-range dependency to well exploit global semantics. Recently,
Vision Transformers have been widely investigated for recognition tasks. Though
appealing, it is inappropriate to simply transfer a recognition-based vision
transformer to image-to-image translation due to the generation difficulty and
the computation limitation. In this paper, we propose an effective and
efficient architecture for unpaired Image-to-Image Translation with
Transformers (ITTR). It has two main designs: 1) hybrid perception block (HPB)
for token mixing from different receptive fields to utilize global semantics;
2) dual pruned self-attention (DPSA) to sharply reduce the computational
complexity. Our ITTR outperforms the state-of-the-arts for unpaired
image-to-image translation on six benchmark datasets.
Related papers
- MxT: Mamba x Transformer for Image Inpainting [11.447968918063335]
Image inpainting aims to restore missing or damaged regions of images with semantically coherent content.
We introduce MxT composed of the proposed Hybrid Module (HM), which combines Mamba with the transformer in a synergistic manner.
Our HM facilitates dual-level interaction learning at both pixel and patch levels, greatly enhancing the model to reconstruct images with high quality and contextual accuracy.
arXiv Detail & Related papers (2024-07-23T02:21:11Z) - SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial
Network for an end-to-end image translation [18.93434486338439]
SCONE-GAN is shown to be effective for learning to generate realistic and diverse scenery images.
For more realistic and diverse image generation we introduce style reference image.
We validate the proposed algorithm for image-to-image translation and stylizing outdoor images.
arXiv Detail & Related papers (2023-11-07T10:29:16Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Leveraging in-domain supervision for unsupervised image-to-image
translation tasks via multi-stream generators [4.726777092009554]
We introduce two techniques to incorporate this invaluable in-domain prior knowledge for the benefit of translation quality.
We propose splitting the input data according to semantic masks, explicitly guiding the network to different behavior for the different regions of the image.
In addition, we propose training a semantic segmentation network along with the translation task, and to leverage this output as a loss term that improves robustness.
arXiv Detail & Related papers (2021-12-30T15:29:36Z) - Smoothing the Disentangled Latent Style Space for Unsupervised
Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results.
We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z) - MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image
Translation [0.0]
We propose a new unpaired image-to-image translation model called MixerGAN.
We show that MixerGAN achieves competitive results when compared to prior convolutional-based methods.
arXiv Detail & Related papers (2021-05-28T21:12:52Z) - Tokens-to-Token ViT: Training Vision Transformers from Scratch on
ImageNet [128.96032932640364]
We propose a new Tokens-To-Token Vision Transformers (T2T-ViT) to solve vision tasks.
T2T-ViT reduces the parameter counts and MACs of vanilla ViT by 200%, while achieving more than 2.5% improvement when trained from scratch on ImageNet.
For example, T2T-ViT with ResNet50 comparable size can achieve 80.7% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2021-01-28T13:25:28Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2
Network [73.5062435623908]
We propose a new I2I translation method that generates a new model in the target domain via a series of model transformations.
By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain.
arXiv Detail & Related papers (2020-10-12T13:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.