Diffusion-based Image Translation using Disentangled Style and Content
Representation
- URL: http://arxiv.org/abs/2209.15264v1
- Date: Fri, 30 Sep 2022 06:44:37 GMT
- Title: Diffusion-based Image Translation using Disentangled Style and Content
Representation
- Authors: Gihyun Kwon, Jong Chul Ye
- Abstract summary: Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer.
It is often difficult to maintain the original content of the image during the reverse diffusion.
We present a novel diffusion-based unsupervised image translation method using disentangled style and content representation.
Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.
- Score: 51.188396199083336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based image translation guided by semantic texts or a single target
image has enabled flexible style transfer which is not limited to the specific
domains. Unfortunately, due to the stochastic nature of diffusion models, it is
often difficult to maintain the original content of the image during the
reverse diffusion. To address this, here we present a novel diffusion-based
unsupervised image translation method using disentangled style and content
representation.
Specifically, inspired by the splicing Vision Transformer, we extract
intermediate keys of multihead self attention layer from ViT model and used
them as the content preservation loss. Then, an image guided style transfer is
performed by matching the [CLS] classification token from the denoised samples
and target image, whereas additional CLIP loss is used for the text-driven
style transfer. To further accelerate the semantic change during the reverse
diffusion, we also propose a novel semantic divergence loss and resampling
strategy. Our experimental results show that the proposed method outperforms
state-of-the-art baseline models in both text-guided and image-guided
translation tasks.
Related papers
- StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation [18.213286385769525]
CycleGAN-based methods are known to hide the mismatched information in the generated images to bypass cycle consistency objectives.
We introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images.
Our approach enhances the semantic consistency of the translated images without requiring additional postprocessing or supervision.
arXiv Detail & Related papers (2024-03-29T12:23:58Z) - Diffusion-based Image Translation with Label Guidance for Domain
Adaptive Semantic Segmentation [35.44771460784343]
Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS)
Existing methods still struggle to preserve semantically-consistent local details between the original and translated images.
We present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation.
arXiv Detail & Related papers (2023-08-23T18:01:01Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff.
In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt.
The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z) - Conditional Score Guidance for Text-Driven Image-to-Image Translation [52.73564644268749]
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model.
Our method aims to generate a target image by selectively editing the regions of interest in a source image.
arXiv Detail & Related papers (2023-05-29T10:48:34Z) - Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style
Transfer [38.957512116073616]
We propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks.
Our method can generate images with the same semantic content as the source image in a zero-shot manner.
arXiv Detail & Related papers (2023-03-15T13:47:02Z) - DSI2I: Dense Style for Unpaired Image-to-Image Translation [70.93865212275412]
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar.
We propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information.
Our results show that the translations produced by our approach are more diverse, preserve the source content better, and are closer to the exemplars when compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-26T18:45:25Z) - Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts.
We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model.
This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z) - Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains.
Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains.
We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.