UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired
image-to-image translation
- URL: http://arxiv.org/abs/2203.02557v1
- Date: Fri, 4 Mar 2022 20:27:16 GMT
- Title: UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired
image-to-image translation
- Authors: Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo,
Meifeng Lin, Brett Viren, Yihui Ren
- Abstract summary: Image-to-image translation has broad applications in art, design, and scientific simulations.
This work examines if equipping CycleGAN with a vision transformer (ViT) and employing advanced generative adversarial network (GAN) training techniques can achieve better performance.
- Score: 7.998209482848582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image-to-image translation has broad applications in art, design, and
scientific simulations. The original CycleGAN model emphasizes one-to-one
mapping via a cycle-consistent loss, while more recent works promote
one-to-many mapping to boost the diversity of the translated images. With
scientific simulation and one-to-one needs in mind, this work examines if
equipping CycleGAN with a vision transformer (ViT) and employing advanced
generative adversarial network (GAN) training techniques can achieve better
performance. The resulting UNet ViT Cycle-consistent GAN (UVCGAN) model is
compared with previous best-performing models on open benchmark image-to-image
translation datasets, Selfie2Anime and CelebA. UVCGAN performs better and
retains a strong correlation between the original and translated images. An
accompanying ablation study shows that the gradient penalty and BERT-like
pre-training also contribute to the improvement.~To promote reproducibility and
open science, the source code, hyperparameter configurations, and pre-trained
model will be made available at: https://github.com/LS4GAN/uvcga.
Related papers
- Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Semantic Image Synthesis with Semantically Coupled VQ-Model [42.19799555533789]
We conditionally synthesize the latent space from a vector quantized model (VQ-model) pre-trained to autoencode images.
We show that our model improves semantic image synthesis using autoregressive models on popular semantic image datasets ADE20k, Cityscapes and COCO-Stuff.
arXiv Detail & Related papers (2022-09-06T14:37:01Z) - Vector-quantized Image Modeling with Improved VQGAN [93.8443646643864]
We propose a Vector-quantized Image Modeling approach that involves pretraining a Transformer to predict image tokens autoregressively.
We first propose multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity.
When trained on ImageNet at 256x256 resolution, we achieve Inception Score (IS) of 175.1 and Frechet Inception Distance (FID) of 4.17, a dramatic improvement over the vanilla VQGAN.
arXiv Detail & Related papers (2021-10-09T18:36:00Z) - Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks.
We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image.
In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z) - ViTGAN: Training GANs with Vision Transformers [46.769407314698434]
Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
We introduce several novel regularization techniques for training GANs with ViTs.
Our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets.
arXiv Detail & Related papers (2021-07-09T17:59:30Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - You Only Need Adversarial Supervision for Semantic Image Synthesis [84.83711654797342]
We propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results.
We show that images synthesized by our model are more diverse and follow the color and texture of real images more closely.
arXiv Detail & Related papers (2020-12-08T23:00:48Z) - Incorporating Reinforced Adversarial Learning in Autoregressive Image
Generation [39.55651747758391]
We propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models.
RAL also empowers the collaboration between different modules of the VQ-VAE framework.
The proposed method achieves state-of-the-art results on Celeba for 64 $times$ 64 image resolution.
arXiv Detail & Related papers (2020-07-20T08:10:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.