CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for
Image Manipulation
- URL: http://arxiv.org/abs/2310.13165v2
- Date: Sat, 9 Mar 2024 20:58:55 GMT
- Title: CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for
Image Manipulation
- Authors: Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai
- Abstract summary: Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation.
This paper introduces Cyclenet, a novel but simple method that incorporates cycle consistency into DMs to regularize image manipulation.
- Score: 57.836686457542385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks
but lack an intuitive interface for consistent image-to-image (I2I)
translation. Various methods have been explored to address this issue,
including mask-based methods, attention-based methods, and image-conditioning.
However, it remains a critical challenge to enable unpaired I2I translation
with pre-trained DMs while maintaining satisfying consistency. This paper
introduces Cyclenet, a novel but simple method that incorporates cycle
consistency into DMs to regularize image manipulation. We validate Cyclenet on
unpaired I2I tasks of different granularities. Besides the scene and object
level translation, we additionally contribute a multi-domain I2I translation
dataset to study the physical state changes of objects. Our empirical studies
show that Cyclenet is superior in translation consistency and quality, and can
generate high-quality images for out-of-domain distributions with a simple
change of the textual prompt. Cyclenet is a practical framework, which is
robust even with very limited training data (around 2k) and requires minimal
computational resources (1 GPU) to train. Project homepage:
https://cyclenetweb.github.io/
Related papers
- Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations [8.248839892711478]
Deep neural networks that achieve remarkable performance in image classification can be easily fooled by tiny transformations.
We show that these approaches still fall short in robustly handling 'natural' image translations that simulate a subtle change in camera orientation.
We present Robust Inference by Crop Selection: a simple method that can be proven to achieve any desired level of consistency.
arXiv Detail & Related papers (2024-04-10T16:39:50Z) - Cross-Domain Image Conversion by CycleDM [6.7113569772720565]
We propose a novel unpaired image-to-image domain conversion method, CycleDM, which incorporates the concept of CycleGAN into the diffusion model.
CycleDM has two internal conversion models that bridge the denoising processes of two image domains.
Our experiments for evaluating the converted images quantitatively and qualitatively found that ours performs better than other comparable approaches.
arXiv Detail & Related papers (2024-03-05T12:35:55Z) - Multi-domain Unsupervised Image-to-Image Translation with Appearance
Adaptive Convolution [62.4972011636884]
We propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework.
We exploit the decomposed content feature and appearance adaptive convolution to translate an image into a target appearance.
We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-02-06T14:12:34Z) - Leveraging in-domain supervision for unsupervised image-to-image
translation tasks via multi-stream generators [4.726777092009554]
We introduce two techniques to incorporate this invaluable in-domain prior knowledge for the benefit of translation quality.
We propose splitting the input data according to semantic masks, explicitly guiding the network to different behavior for the different regions of the image.
In addition, we propose training a semantic segmentation network along with the translation task, and to leverage this output as a loss term that improves robustness.
arXiv Detail & Related papers (2021-12-30T15:29:36Z) - Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid [102.24539566851809]
Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task.
Recent image inpainting models have made significant progress in generating vivid visual details, but they can still lead to texture blurring or structural distortions.
We propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors can greatly benefit the recovery of locally missing content in images.
arXiv Detail & Related papers (2021-12-08T04:33:33Z) - USIS: Unsupervised Semantic Image Synthesis [9.613134538472801]
We propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS)
USIS learns to output images with visually separable semantic classes using a self-supervised segmentation loss.
In order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination.
arXiv Detail & Related papers (2021-09-29T20:48:41Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2
Network [73.5062435623908]
We propose a new I2I translation method that generates a new model in the target domain via a series of model transformations.
By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain.
arXiv Detail & Related papers (2020-10-12T13:51:40Z) - Semi-supervised Learning for Few-shot Image-to-Image Translation [89.48165936436183]
We propose a semi-supervised method for few-shot image translation, called SEMIT.
Our method achieves excellent results on four different datasets using as little as 10% of the source labels.
arXiv Detail & Related papers (2020-03-30T22:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.