UVCGAN v2: An Improved Cycle-Consistent GAN for Unpaired Image-to-Image
Translation
- URL: http://arxiv.org/abs/2303.16280v3
- Date: Fri, 22 Sep 2023 16:59:08 GMT
- Title: UVCGAN v2: An Improved Cycle-Consistent GAN for Unpaired Image-to-Image
Translation
- Authors: Dmitrii Torbunov, Yi Huang, Huan-Hsin Tseng, Haiwang Yu, Jin Huang,
Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren
- Abstract summary: An unpaired image-to-image (I2I) translation technique seeks to find a mapping between two domains of data in a fully unsupervised manner.
DMs hold the state-of-the-art status on the I2I translation benchmarks in terms of Frechet distance (FID)
This work improves a recent UVCGAN model and equips it with modern advancements in model architectures and training procedures.
- Score: 10.689788782893096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An unpaired image-to-image (I2I) translation technique seeks to find a
mapping between two domains of data in a fully unsupervised manner. While
initial solutions to the I2I problem were provided by generative adversarial
neural networks (GANs), diffusion models (DMs) currently hold the
state-of-the-art status on the I2I translation benchmarks in terms of Frechet
inception distance (FID). Yet, DMs suffer from limitations, such as not using
data from the source domain during the training or maintaining consistency of
the source and translated images only via simple pixel-wise errors. This work
improves a recent UVCGAN model and equips it with modern advancements in model
architectures and training procedures. The resulting revised model
significantly outperforms other advanced GAN- and DM-based competitors on a
variety of benchmarks. In the case of Male-to-Female translation of CelebA, the
model achieves more than 40% improvement in FID score compared to the
state-of-the-art results. This work also demonstrates the ineffectiveness of
the pixel-wise I2I translation faithfulness metrics and suggests their
revision. The code and trained models are available at
https://github.com/LS4GAN/uvcgan2
Related papers
- Reliable Multi-modal Medical Image-to-image Translation Independent of Pixel-wise Aligned Data [2.328200803738193]
We develop a novel multi-modal medical image-to-image translation model independent of pixel-wise aligned data (MITIA)
MITIA achieves superior performance compared to six other state-of-the-art image-to-image translation methods.
arXiv Detail & Related papers (2024-08-26T13:45:58Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - DiffI2I: Efficient Diffusion Model for Image-to-Image Translation [108.82579440308267]
Diffusion Model (DM) has emerged as the SOTA approach for image synthesis.
DM can't perform well on some image-to-image translation (I2I) tasks.
DiffI2I comprises three key components: a compact I2I prior extraction network (CPEN), a dynamic I2I transformer (DI2Iformer) and a denoising network.
arXiv Detail & Related papers (2023-08-26T05:18:23Z) - Guided Image-to-Image Translation by Discriminator-Generator
Communication [71.86347329356244]
The goal of Image-to-image (I2I) translation is to transfer an image from a source domain to a target domain.
One major branch of this research is to formulate I2I translation based on Generative Adversarial Network (GAN)
arXiv Detail & Related papers (2023-03-07T02:29:36Z) - Shifted Diffusion for Text-to-image Generation [65.53758187995744]
Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text.
Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks.
arXiv Detail & Related papers (2022-11-24T03:25:04Z) - Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z) - Fine-Tuning StyleGAN2 For Cartoon Face Generation [0.0]
We propose a novel image-to-image translation method that generates images of the target domain by finetuning a stylegan2 pretrained model.
The stylegan2 model is suitable for unsupervised I2I translation on unbalanced datasets.
arXiv Detail & Related papers (2021-06-22T14:00:10Z) - Dual Contrastive Learning for Unsupervised Image-to-Image Translation [16.759958400617947]
Unsupervised image-to-image translation tasks aim to find a mapping between a source domain X and a target domain Y from unpaired training data.
Contrastive learning for Unpaired image-to-image Translation yields state-of-the-art results.
We propose a novel method based on contrastive learning and a dual learning setting to infer an efficient mapping between unpaired data.
arXiv Detail & Related papers (2021-04-15T18:00:22Z) - Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2
Network [73.5062435623908]
We propose a new I2I translation method that generates a new model in the target domain via a series of model transformations.
By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain.
arXiv Detail & Related papers (2020-10-12T13:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.